Version 6 (modified by hartono, 15 years ago) (diff) |
---|
Experimental Results on Empirical Performance Optimizations
Platform
All results were obtained from using a quad-core Intel Core 2 Quad Q6600 CPU clocked at 2.4 Ghz with 32 KB L1 D cache, 8MB of L2 cache (4MB shared per core pair), and 2 GB of DDR2-667 RAM, running Linux kernel version 2.6.22 (x86-64). The compiler used was ICC 10.1.
LU Decomposition
Original code
for (k=0; k<=N-1; k++) { for (j=k+1; j<=N-1; j++) A[k][j] = A[k][j]/A[k][k]; for(i=k+1; i<=N-1; i++) for (j=k+1; j<=N-1; j++) A[i][j] = A[i][j]-A[i][k]*A[k][j]; }
Sequential (single core)
Parallel (multi-core)
Attachments
- trmm.png (3.1 KB) - added by hartono 15 years ago.
- trmm-par.png (3.3 KB) - added by hartono 15 years ago.
- lu.png (2.8 KB) - added by hartono 15 years ago.
- lu-par.png (2.9 KB) - added by hartono 15 years ago.
- seidel.png (2.4 KB) - added by hartono 15 years ago.
- seidel-par.png (2.7 KB) - added by hartono 15 years ago.
- adi.png (2.6 KB) - added by hartono 15 years ago.
- adi-par.png (3.1 KB) - added by hartono 15 years ago.
- fdtd-2d.png (2.9 KB) - added by hartono 15 years ago.
- fdtd-2d-par.png (2.7 KB) - added by hartono 15 years ago.
- gemver.png (2.3 KB) - added by hartono 15 years ago.
- gemver-par.png (2.6 KB) - added by hartono 15 years ago.