Changes between Version 10 and Version 11 of AnnPerformance


Ignore:
Timestamp:
05/13/08 12:54:02 (15 years ago)
Author:
hartono
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AnnPerformance

    v10 v11  
    33== Platform == 
    44All results were obtained from using a quad-core Intel Core 2 Quad Q6600 CPU clocked at 2.4 Ghz with 32 KB L1 D cache, 8MB of L2 cache (4MB shared per core pair), and 2 GB of DDR2-667 RAM, running Linux kernel version 2.6.22 (x86-64). The compiler used was ICC 10.1. 
     5 
     6== Optimizations used == 
     7We used PLuTo (an auto-parallelization and locality optimization tool based on polyhedral models) as a polyhedral-based code transformator. And we also extended ancc with additional modules used to perform syntactical transformations. Below are the (polyhedral and syntactic) optimizations used in this experiment. 
     8 
     9''Polyhedral'' transformations (from PLuTo): 
     10 * Loop tiling for L1 and L2 caches 
     11 * Loop fusion 
     12 * Parallelization for multicore machines 
     13 * Register tiling (for rectangular iteration spaces) 
     14 
     15''Syntactic'' transformations (from ancc modules): 
     16 * Register tiling (for both rectangular and non-rectangular iteration spaces) 
     17 * Loop permutation/interchange 
     18 * Scalar replacement (to enhance register reuse) 
     19  
     20It is to be noted that the register tiling approach used by PLuTo is limited to only rectangular loops. To further improve the resulting performance, we implemented our own register tiling approach as one of the ancc's transformation modules. Our register tiling approach is so general that it can handle both rectangular and non-rectangular loops. 
    521 
    622== LU Decomposition == 
     
    1834}}} 
    1935 
    20 === Optimizations used === 
    21 We used PLuTo (an auto-parallelization and locality optimization tool based on polyhedral models) as a polyhedral-based code transformator. And we also extended ancc with additional modules used to perform syntactical transformations. Below are the (polyhedral and syntactic) optimizations used in this experiment. 
    22  
    23 ''Polyhedral'' transformations (from PLuTo): 
    24  * Loop tiling for L1 and L2 caches 
    25  * Parallelization for multicore machines 
    26  * Register tiling (for rectangular iteration spaces) 
    27  
    28 ''Syntactic'' transformations (from ancc modules): 
    29  * Register tiling (for both rectangular and non-rectangular iteration spaces) 
    30  * Loop permutation/interchange 
    31  * Scalar replacement (to enhance register reuse) 
    32   
    33 It is to be noted that the register tiling approach used by PLuTo is limited to only rectangular loops. To further improve the resulting performance, we implemented our own register tiling approach as one of the ancc's transformation modules. Our register tiling approach is so general that it can handle both rectangular and non-rectangular loops. 
    34  
    3536=== Sequential (single core) === 
    3637 [[Image(lu.png,nolink)]]