wiki:WikiStart

Version 13 (modified by jacob, 7 years ago) (diff)

--

Parvis: Parallel Analysis Tools and New Visualization Techniques for Ultra-Large Climate Data Sets

The large volume of data currently produced by climate models is overwhelming the current, decades-old visualization workflow. The traditional methods for visualizing climate output also have not kept pace with changes in the types of grids used, the number of variables involved, and the number of different simulations performed with a climate model or the feature-richness of high-resolution simulations.

The bottleneck to producing climate model images is not always the drawing of the image on the screen but rather the calculations that must be performed on the climate model output before an image can be made. Parvis will speed up this post-processing, or analysis, phase of climate visualization by developing a new Parallel Climate Analysis Library (ParCAL) that will vastly improve the speed of climate data analysis compared to the current serial tools. ParCAL will build on currently available software technology that will also permit calculations to be performed in parallel on many different numerical grids. We will interface ParCAL with popular climate modeling tool the NCAR Command Language (NCL).

We are also exploring the use of existing tools such as Swift and Pagoda to bring immediate speed-ups to climate model post-processing workflow.


ParCAL development

We will be building ParCAL on top of the MOAB, Intrepid, and PNetCDF libraries.

Current workflow improvements


Hardware Model

The hardware model we envision for postprocessing of high-resolution output is a “Data Analysis Center” [as defined in this report from a workshop on climate at exascale]---computational infrastructure optimized and dedicated to the task of analyzing and visualizing large, complex data. One possible model for this is the Argonne Leadership Computing Facility (ALCF), which has several petabytes of disk storage space, with no per user quotas, attached to its IBM BlueGene/P computer. The same disk is attached to a DAV cluster (called “Eureka”) consisting of 100 dual-quad core servers each with dual NVIDIA Quadro graphics cards. In this model, the original climate output remains at the ALCF, or wherever it is generated, and the DAV is performed using multiple nodes of Eureka (or equivalent resource) accessing the same physical disk. Eureka and its attached, shared disk are not a data analysis center because today’s climate DAV tools are unable to take advantage of more than one node of Eureka at a time.

For development, we will also be using Fusion, a linux cluster supported by Argonne's Laboratory Computing Resource Center.


This research is sponsored by the Office of Biological and Environmental Research of U.S. Department of Energy's Office of Science.


Trac Starting Points

For a complete list of local wiki pages, see TitleIndex.

Attachments