wiki:Extreme
Last modified 10 years ago Last modified on 11/03/09 16:04:57

Program Behavior at Extreme Scale

  1. Performance tool notes
  2. ISP
  3. Applications
  4. Communication Pattern Detection
  5. Precise Dynamic Location of Bugs
  6. ScalaTrace
  7. Developing Ideas
  8. Benchmark Suites
    1. Vorpalite
  9. General Analysis Notes
  10. References

Task Mapping

[17,18,19]

  1. [19] uses simulated annealing (SA)
    • Analysis runtimes on medium scales are several hours for plain SA and 30 min with graph partitioning.
    • Uses a communication matrix to do off-line analysis.
  2. [17] discusses various consistent mapping orders and gray-codes.
  3. [18] uses graph fitting.
  4. [17,18] assume a regular, known topology.

Communication Pattern Detection

  1. Pattern detection according to [1] correlates similar event traces, with some freedom (describe...).
    • Well suited from SPMD programs.
    • Does not handle communication patterns between heterogeneous processes.

Precise Dynamic Location of Bugs

  1. A fault caused by a bug may be separated from the bug in time.
    • How to determine causality and at what abstraction?

ScalaTrace

[3,5,6,7,10,12]

  1. Original paper is [12].
  2. Yields the semantic information of MPI events to compress traces.
  3. Attains near constant size traces.
    • [7] attempts to explain sub-linear and super-linear trace sizes given the run-time and scale.
  4. [6] extends the compression used in ScalaTrace to include time stamps.
  5. [6] suggests the use of MRNet for trace collection to offset memory overhead.
  6. Must evaluate whether compression is good for irregular communication patterns.

Developing Ideas

  1. On-line analysis.
    1. Make inferences at run-time.
      • Can a meaningful analysis be done locally?
      • Impact on compression.
        • May reduce the data needed to be stored. May allow new semantic compression.
        • ScalaTrace methods will not be directly applicable. Some of the code may be reused.
  2. Data profiling.
    1. How to summarize the data exchanged?
  3. Predictive Performance Evaluation
    1. Predict performance with a different configuration (different topology, etc.)
    2. Predictions need not be complete. Reports may include only effects in the near future.
  4. Grammar inference based model extraction [16]
    1. Focuses on finite state machine extraction from traces.
    2. Methods may be applied locally, not to a complete trace.
  5. Event-based model of behavior [15]

Benchmark Suites

[11]

  1. UMT (UMT2k?) in [11] has known load imbalance [6].

References

[1] Preissl, R., Köckerbauer, T., Schulz, M., Kranzlmüller, D., Supinski, B. R., and Quinlan, D. J. 2008. Detecting Patterns in MPI Communication Traces.

[2] Preissl, R., Schulz, M., Kranzlmüller, D., Supinski, B. R., and Quinlan, D. J. 2008. Using MPI Communication Patterns to Guide Source Code Transformations.

[3] ScalaTrace project.

[4] B. de Supinski, Rob Fowler, Todd Gamblin, F. Mueller, P. Ratn, and M. Schulz. An Open Infrastructure for Scalable, Reconfigurable Analysis.

[5] M. Noeth, P. Ratn, F. Mueller, M. Schulz, and B. de Supinski. ScalaTrace: Scalable Compression and Replay of Communication Traces in High Performance Computing.

[6] P. Ratn, F. Mueller, M. Schulz, and B. de Supinski. Preserving Time in Large-Scale Communication Traces.

[7] MPI Trace Compression Tuning Project.

[8] T. Gamblin, P. Ratn, B. de Supinski, M. Schulz, F. Mueller, R. Fowler, and D. Reed. An Open Framework for Scalable, Reconfigurable Performance Analysis.

[9] An Open Framework for Scalable, Reconfigurable Performance Analysis. (html)

[10] Michael Noeth. Scalable Compression and Replay of Communication Traces in Massively Parallel Environments.

[11] The ASCI Purple Benchmark Codes.

[12] M. Noeth, F. Mueller, M. Schulz, and B. R. de Supinski. Scalable compression and replay of communication traces in massively parallel environments.

[13] Bhatia, N., Song, F., Wolf, F., Dongarra, J., Mohr, B., Moore, S. Automatic Experimental Analysis of Communication Patterns in Virtual Topologies.

[14] Huband, S. and McDonald, C. 2001. A Preliminary Topological Debugger for MPI Programs.

[15] Peter C. Bates. Debugging heterogeneous distributed systems using event-based models of behavior.

[16] Cook, J. E. and Wolf, A. L. Discovering models of software processes from event-based data.

[17] Yu, H., Chung, I., and Moreira, Topology mapping for Blue Gene/L supercomputer.

[18] Smith, and Bode. Performance Effects of Node Mappings on the IBM BlueGene/L Machine.

[19] G. Bhanot A. Gara P. Heidelberger E. Lawless J. C. Sexton, and R. Walkup. Optimizing task layout on the Blue Gene/L supercomputer.

[20] Ed Upchurch, Paul L. Springer, Maciej Brodowicz, Sharon Brunett, and T.D. Gottschalk. Performance Analysis of Blue Gene/L Using Parallel Discrete Event Simulation.

[21] Ed Upchurch, Paul L. Springer, Maciej Brodowicz, Sharon Brunett, and T.D. Gottschalk. Analysis, Tracing, Characterization and Performance Modeling of Select ASCI Applications for Blue Gene/L Using Parallel Discrete Event Simulation.

[22] Springer and Upchurch. Using SPEEDES to Simulate the Blue Gene Interconnect Network.

[23] Terry L. Wilmarth, Gengbin Zheng, Eric J. Bohm, Yogesh Mehta, Nilesh Choudhury, Praveen Jagadishprasad and Laxmikant V. Kale. Performance Prediction using Simulation of Large-scale Interconnection Networks in POSE.

[24] Gengbin Zheng, Gunavardhan Kakulapati, Laxmikant V. Kalé. BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines.