Version 14 (modified by jacob, 7 years ago) (diff)

Add Fusion stats

Parvis: Parallel Analysis Tools and New Visualization Techniques for Ultra-Large Climate Data Sets

The large volume of data currently produced by climate models is overwhelming the current, decades-old visualization workflow. The traditional methods for visualizing climate output also have not kept pace with changes in the types of grids used, the number of variables involved, and the number of different simulations performed with a climate model or the feature-richness of high-resolution simulations.

The bottleneck to producing climate model images is not always the drawing of the image on the screen but rather the calculations that must be performed on the climate model output before an image can be made. Parvis will speed up this post-processing, or analysis, phase of climate visualization by developing a new Parallel Climate Analysis Library (ParCAL) that will vastly improve the speed of climate data analysis compared to the current serial tools. ParCAL will build on currently available software technology that will also permit calculations to be performed in parallel on many different numerical grids. We will interface ParCAL with popular climate modeling tool the NCAR Command Language (NCL).

We are also exploring the use of existing tools such as Swift and Pagoda to bring immediate speed-ups to climate model post-processing workflow.

ParCAL development

We will be building ParCAL on top of the MOAB, Intrepid, and PNetCDF libraries.

Current workflow improvements

Hardware Model

The hardware model we envision for postprocessing of high-resolution output is a “Data Analysis Center” [as defined in this report from a workshop on climate at exascale]---computational infrastructure optimized and dedicated to the task of analyzing and visualizing large, complex data. One possible model for this is the Argonne Leadership Computing Facility (ALCF), which has several petabytes of disk storage space, with no per user quotas, attached to its IBM BlueGene/P computer. The same disk is attached to a DAV cluster (called “Eureka”) consisting of 100 dual-quad core servers each with dual NVIDIA Quadro graphics cards. In this model, the original climate output remains at the ALCF, or wherever it is generated, and the DAV is performed using multiple nodes of Eureka (or equivalent resource) accessing the same physical disk. Eureka and its attached, shared disk are not a data analysis center because today’s climate DAV tools are unable to take advantage of more than one node of Eureka at a time.

For development, we will also be using Fusion, a linux cluster supported by Argonne's Laboratory Computing Resource Center. Fusion has 320 nodes each with dual quad-core Intel Nehalem 2.53GHz processors (2560 total processors). Each node has 36GB memory and 16 "fat" nodes have 64 GB. The interconnect is Infiniband QDR 4GB/s per link.

This research is sponsored by the Office of Biological and Environmental Research of U.S. Department of Energy's Office of Science.

Trac Starting Points

For a complete list of local wiki pages, see TitleIndex.