Version 11 (modified by hartono, 15 years ago) (diff)

--

# Performance Tuning Specifications of Orio

In addition to the quick start guide presented in the Orio's main webpage, we provide this documentation to put more details on tuning specifications so that users can fully benefit the automatic tuning feature of Orio.

## Example

Below is a concrete illustration of how the tuning specifications of Orio look like.

```def build {
arg command = 'icc';
arg options = '-fast -parallel';
}

let NUM_REGS = 128;
let L1_CACHE_SIZE = 64*(2**20);
def performance_params {
param TileSize1[] = [1,32,64,128,256,512];
param TileSize2[] = [1,32,64,128,256,512];
param UnrollFactor1[] = range(1,32);
param UnrollFactor2[] = range(1,32);
constraint RegisterCapacity = UnrollFactor1 * UnrollFactor2 * 9 <= NUM_REGS;
constraint L1Tiling = TileSize1 * TileSize2 <= L1_CACHE_SIZE;
}

def input_params {
let SIZES = [100,1000,2000,4000,8000];
param M[] = SIZES;
param N[] = SIZES;
constraint SquareShape = M == N;
}

def input_vars {
decl dynamic double X[M] = 0;
decl dynamic double Y[N] = random
decl static double A[M][N] = random;
decl double C = random;
}
```

## Structure of Tuning Specifications

The tuning specifications of Orio simply consist of a sequence of definition statements. Every definition statement contains a series of auxiliary statements, which can be categorized into five different types of statements as follows.

1. Let statement has the main purpose of storing a temporary data into a variable that may be reused multiple times by other successive statements. To be noted that the location of a let statement need not be inside the body of a definition statement, as seen in the above example.
2. Argument statement is used to collect specific information from the Orio user about the pertinent tuning components. One example shown above is the command and options arguments (in the build definition), of which role is to tell Orio about how to compile and execute the optimized code.
3. Parameter statement is used to assign a range of values to the tuning parameters, which can be either performance parameters or input problem parameters. The symbol [] must be placed after the parameter name to indicate that the parameter has multiple values to be considered.
4. Constraint statement aims primarily to prune off uninteresting portion of the space of parameter values so that the search is concentrated on the search space highly possible to yield high quality solutions. Some examples are the RegisterCapacity and L1Tiling constraints. Moreover, constraint statement also allows users to define the shape of the input arrays such as the SquareShape constraint, which can be found in the earlier example.
5. Declaration statement informs the performance testing driver about all input scalars and arrays required to be declared and initialized. It is to be noted that the static and dynamic keywords provide guidance to the driver on how it should allocate memory space for the declared arrays.

## Declarations and Initializations of Input Variables

As just mentioned before, all input variables involved in the core computation must be specified in the input_vars definition statement so that the performance testing driver can construct code for both the declarations and the initializations of the input variables. However, declarations and initializations of input variables can turn complicated, especially for multidimensional arrays with unique properties such as upper/lower triangular matrices and anti-symmetric matrices. As a consequence, Orio offers three alternatives to its users on how input variables can be declared and initialized accurately.

1. Both declarations and initializations are generated by the driver.
```def input_vars {
decl static double X[N][N] = 0;
}
```
2. Declarations are generated by the driver, whereas initializations are written by the user. To be noted that all the declaration statements must have no initial assigned values.
```def input_vars {
decl static double X[N][N];
arg init_file = 'init_code.c';
}
```
3. Both declarations and initializations are written by the user.
```def input_vars {
arg decl_file = 'decl_code.h';
arg init_file = 'init_code.c';
}
```

The following is the content of the decl_code.h file.

```double X[N][N];
```

And the code of the init_code.c file is displayed below.

```void init_input_vars() {
int i,j;
for (i=0; i<=N-1; i++)
for (j=0; j<=N-1; j++)
if (i < j)
X[i][j] = (i+j)%10 + 1;
else if (i == j)
X[i][j] = 1;
else
X[i][j] = 0;
}
```

One prerequisite of a user-provided initialization program is that the input variables’ initializations must be enclosed inside a function named init_input_vars; otherwise, Orio will report an error message.

## Extending the Performance Testing Code

First, let us look at the basic C skeleton code used by Orio to measure the performance of the optimized code.

```#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

/*@ global @*/

double rtclock()
{
struct timezone tzp;
struct timeval tp;
int stat;
gettimeofday(&tp, &tzp);
return (tp.tv_sec + tp.tv_usec*1.0e-6);
}

int main()
{
/*@ prologue @*/

double orio_t_start=0, orio_t_end=0, orio_t_total=0;
int orio_i;

for (orio_i=0; orio_i<REPS; orio_i++)
{
orio_t_start = rtclock();

/*@ tested code @*/

orio_t_end = rtclock();
orio_t_total += orio_t_end - orio_t_start;
}

orio_t_total = orio_t_total / REPS;
printf("%f\n", orio_t_total);

/*@ epilogue @*/

return 0;
}
```

Four important tags exist in the skeleton code, each with its own special purpose. The '/*@ global @*/' tag is used by the driver to place global declarations such as input variable declarations and the initialization function of the input arrays. The ‘/*@ prologue @*/’ tag is the spot where to dynamically allocate memory space for input arrays and to call the function that initializes all input variables. The ‘/*@ tested code @*/’ tag designates the location of the code tested for its performance. And the last tag ‘/*@ epilogue @*/’ remains unused until now, but it is kept for potential development in the future.

From the skeleton code, it can also be observed that Orio simply uses elapsed time as its performance metric through the standard C’s gettimeofday function. As a part of our future plans, we intend to employ a performance counter tool, such as TAO and PAPI, to facilitate a finer grained performance measurement (e.g. the number of processor clock cycles).