Version 22 (modified by hartono, 15 years ago) (diff) |
---|
Experimental Results on Empirical Performance Optimizations
Platform
All results were obtained from using a quad-core Intel Core 2 Quad Q6600 CPU clocked at 2.4 Ghz with 32 KB L1 D cache, 8MB of L2 cache (4MB shared per core pair), and 2 GB of DDR2-667 RAM, running Linux kernel version 2.6.22 (x86-64). The compiler used was ICC 10.1.
Optimizations used
We used PLuTo (an auto-parallelization and locality optimization tool based on polyhedral models) as a polyhedral-based code transformator. And we also extended ancc with additional modules used to perform syntactical transformations. Below are the (polyhedral and syntactic) optimizations used in this experiment.
Polyhedral transformations (from PLuTo):
- Loop tiling for L1 and L2 caches
- Loop permutation/interchange
- Loop fusion
- Parallelization for multicore machines
- Loop unrolling/jamming
Syntactic transformations (from ancc module):
- Register tiling
- Scalar replacement (to enhance register reuse)
- Loop permutation/interchange
It is to be noted that the register tiling approach used by PLuTo is limited to only rectangular loops. To further improve the resulting performance, we implemented our new register tiling approach as one of the ancc's transformation modules. Our register tiling approach is so general that it can handle both rectangular and non-rectangular loops.
LU Decomposition
Code
for (k=0; k<=N-1; k++) { for (j=k+1; j<=N-1; j++) A[k][j] = A[k][j]/A[k][k]; for(i=k+1; i<=N-1; i++) for (j=k+1; j<=N-1; j++) A[i][j] = A[i][j]-A[i][k]*A[k][j]; }
Sequential (single core)
Parallel (multi-core)
3-D Gauss Seidel
Code
for (t=0; t<=T-1; t++) for (i=1; i<=N-2; i++) for (j=1; j<=N-2; j++) A[i][j] = (A[i-1][j-1] + A[i-1][j] + A[i-1][j+1] + A[i][j-1] + A[i][j] + A[i][j+1] + A[i+1][j-1] + A[i+1][j] + A[i+1][j+1]) / 9.0;
Sequential (single core)
Parallel (multi-core)
DTRMM -- Triangular Matrix-Matrix Product
Code
for (i=0; i<=N-1; i++) for (j=0; j<=N-1; j++) for (k=i+1; k<=N-1; k++) B[i][j] = B[i][j] + alpha*A[i][k]*B[k][j];
Note that the above code is intended for use where matrix A is an upper triangular matrix with unit diagonal elements.
Sequential (single core)
Parallel (multi-core)
ADI -- Alternate Direction Implicit
Code
for (t=0; t<=T-1; t++) { for (i1=0; i1<=N-1; i1++) for (i2=1; i2<=N-1; i2++) { X[i1][i2] = X[i1][i2] - X[i1][i2-1] * A[i1][i2] / B[i1][i2-1]; B[i1][i2] = B[i1][i2] - A[i1][i2] * A[i1][i2] / B[i1][i2-1]; } for (i1=1; i1<=N-1; i1++) for (i2=0; i2<=N-1; i2++) { X[i1][i2] = X[i1][i2] - X[i1-1][i2] * A[i1][i2] / B[i1-1][i2]; B[i1][i2] = B[i1][i2] - A[i1][i2] * A[i1][i2] / B[i1-1][i2]; } }
Sequential (single core)
Parallel (multi-core)
FDTD-2D
for(t=0; t<=T-1; t++) { for (j=0; j<=N-1; j++) ey[0][j] = t; for (i=1; i<=N-1; i++) for (j=0; j<=N-1; j++) ey[i][j] = ey[i][j] - 0.5*(hz[i][j]-hz[i-1][j]); for (i=0; i<=N-1; i++) for (j=1; j<=N-1; j++) ex[i][j] = ex[i][j] - 0.5*(hz[i][j]-hz[i][j-1]); for (i=0; i<=N-1; i++) for (j=0; j<=N-1; j++) hz[i][j] = hz[i][j] - 0.7*(ex[i][j+1]-ex[i][j]+ey[i+1][j]-ey[i][j]); }
Sequential (single core)
Parallel (multi-core)
GEMVER
for (i=0; i<=N-1; i++) for (j=0; j<=N-1; j++) B[i][j] = A[i][j] + u1[i]*v1[j] + u2[i]*v2[j]; for (i=0; i<=N-1; i++) for (j=0; j<=N-1; j++) x[i] = x[i] + beta* B[j][i]*y[j]; for (i=0; i<=N-1; i++) x[i] = x[i] + z[i]; for (i=0; i<=N-1; i++) for (j=0; j<=N-1; j++) w[i] = w[i] + alpha* B[i][j]*x[j];
Sequential (single core)
Parallel (multi-core)
Attachments
- trmm.png (3.1 KB) - added by hartono 15 years ago.
- trmm-par.png (3.3 KB) - added by hartono 15 years ago.
- lu.png (2.8 KB) - added by hartono 15 years ago.
- lu-par.png (2.9 KB) - added by hartono 15 years ago.
- seidel.png (2.4 KB) - added by hartono 15 years ago.
- seidel-par.png (2.7 KB) - added by hartono 15 years ago.
- adi.png (2.6 KB) - added by hartono 15 years ago.
- adi-par.png (3.1 KB) - added by hartono 15 years ago.
- fdtd-2d.png (2.9 KB) - added by hartono 15 years ago.
- fdtd-2d-par.png (2.7 KB) - added by hartono 15 years ago.
- gemver.png (2.3 KB) - added by hartono 15 years ago.
- gemver-par.png (2.6 KB) - added by hartono 15 years ago.