| 87 | |
| 88 | Orio has several code transformation module that have already been implemented and are ready to use. One of the transformation modules is ''loop unrolling'', a loop optimization that aims to increase register reuse and to reduce branching instructions by combining instructions that are executed in multiple loop iterations into a single iteration. The below sample code demonstrates how to annotate an application code with a simple portable loop unrolling optimization, where the unroll factor used in this example is four. The original code to be optimized in this example is commonly known as AXPY-4, which is an extended version of the AXPY Basic Liner Algebra Subprogram. |
| 89 | |
| 90 | {{{ |
| 91 | /*@ begin Loop ( |
| 92 | transform Unroll(ufactor=4) |
| 93 | for (i=0; i<=N-1; i++) |
| 94 | y[i] = y[i] + a1*x1[i] + a2*x2[i] + a3*x3[i] + a4*x4[i]; |
| 95 | ) @*/ |
| 96 | for (i=0; i<=N-1; i++) |
| 97 | y[i] = y[i] + a1*x1[i] + a2*x2[i] + a3*x3[i] + a4*x4[i]; |
| 98 | /*@ end @*/ |
| 99 | }}} |
| 100 | |
| 101 | In order to apply loop unrolling to the above code, run the following Orio command (assuming that the code is stored in the file `axpy4.c`). |
| 102 | |
| 103 | {{{ |
| 104 | % orcc axpy4.c |
| 105 | }}} |
| 106 | |
| 107 | By default, the transformed output code is written to the file `_axpy4.c`. Users can specify the name of the output file using the command option '`-o <file>`'. Below is how the output code looks like. |
| 108 | |
| 109 | {{{ |
| 110 | /*@ begin Loop ( |
| 111 | transform Unroll(ufactor=4) |
| 112 | for (i=0; i<=N-1; i++) |
| 113 | y[i] = y[i] + a1*x1[i] + a2*x2[i] + a3*x3[i] + a4*x4[i]; |
| 114 | ) @*/ |
| 115 | #if ORIGCODE |
| 116 | for (i=0; i<=N-1; i++) |
| 117 | y[i] = y[i] + a1*x1[i] + a2*x2[i] + a3*x3[i] + a4*x4[i]; |
| 118 | #else |
| 119 | for (i=0; i<=N-4; i=i+4) { |
| 120 | y[i] = y[i] + a1*x1[i] + a2*x2[i] + a3*x3[i] + a4*x4[i]; |
| 121 | y[i+1] = y[i+1] + a1*x1[i+1] + a2*x2[i+1] + a3*x3[i+1] + a4*x4[i+1]; |
| 122 | y[i+2] = y[i+2] + a1*x1[i+2] + a2*x2[i+2] + a3*x3[i+2] + a4*x4[i+2]; |
| 123 | y[i+3] = y[i+3] + a1*x1[i+3] + a2*x2[i+3] + a3*x3[i+3] + a4*x4[i+3]; |
| 124 | } |
| 125 | for (; i<=N-1; i=i+1) |
| 126 | y[i] = y[i] + a1*x1[i] + a2*x2[i] + a3*x3[i] + a4*x4[i]; |
| 127 | #endif |
| 128 | /*@ end @*/ |
| 129 | }}} |