185 | | The tuned application in the given example is the same AXPY-4 used in the earlier subsection. In this example, the goal of the tuning process is to determine the most optimal value of the unroll factor parameter (represented as variable UF) for two distinct problem sizes: N=1K and N=10M. The unroll factor values under consideration extends over integers from 1 to 32, inclusively. |
| 185 | The tuned application in the given example is the same AXPY-4 used in the earlier subsection. The goal of the tuning process is to determine the most optimal value of the unroll factor parameter for different problem sizes. The code located in the `PerfTuning` module body section defines the ''tuning specifications'' that include the following four definitions: |
| 186 | |
| 187 | * ''build'': to specify all information needed for compiling and executing the optimized code |
| 188 | * ''performance_params'': to specify values of parameters used in the program transformations |
| 189 | * ''input_params'': to specify sizes of the input problem |
| 190 | * ''input_vars'': to specify both the declarations and the initializations of the input variables |
| 191 | |
| 192 | So in this example, the transformed AXPY-4 code is compiled using GCC compiler with the -O3 option to activate all its optimizations. The unroll factor values under consideration extends over integers from 1 to 32, inclusively. The AXPY-4 computation is tuned for two distinct problem sizes: N=1K and N=10M. Also, all scalars and arrays involved in the computation must be declared and initialized in the tuning specifications to enable the performance testing driver to empirically execute the optimized code. It is to be noted that the ''static'' and ''dynamic'' keywords provide guidance to the performance testing driver as it allocates memory space for the declared arrays. |