PnetCDF: A Parallel I/O Library for NetCDF File Access
The official project web pages and source code repository of PnetCDF have been migrated to github.
- URL: https://parallel-netcdf.github.io
- The old project main page is kept below and will be removed in the near future.
Credits
PnetCDF project is jointly developed by Northwestern University and Argonne National Laboratory.
News
Overview of PnetCDF
PnetCDF is a high-performance parallel I/O library for accessing files in format compatibility with Unidata's NetCDF, specifically the formats of CDF-1, 2, and 5. The CDF-5 file format, an extension of CDF-2, supports unsigned data types and uses 64-bit integers to allow users to define large dimensions, attributes, and variables (> 2B array elements).
In addition to the conventional netCDF read and write APIs, PnetCDF also provides a new set of nonblocking APIs. Nonblocking APIs allow users to post multiple read and write requests first, and later let PnetCDF to aggregate the requests into a large MPI-IO request, hence to achieve a better performance. See nonblocking I/O for further description and example programs.
- Software Downloads: latest and previous software releases, as well as information for accessing the SVN repository of source codes under development.
- Documentation: a quick tutorial, published papers, presentations, articles, and other resources
- C Interface User Guide is now available.
- Q&A contains guide lines for achieving a better I/O performance.
- Benchmarking: tools and suggestions for evaluating PnetCDF performance
- Starting from version 4.4.0, the NetCDF library developed at Unidata supports the CDF-5 file format for both sequential and parallel access.
- A set of NetCDF-4 example programs are available to show how to access files in parallel through PnetCDF or HDF5 libraries underneath.
A Brief Background About NetCDF
NetCDF gives scientific programmers a self-describing and portable means for storing data. However, prior to version 4, netCDF does so in a serial manner.
NetCDF started to support parallel I/O from version 4, whose parallel I/O feature was at first built on top of parallel HDF5. Thus, the file format required by NetCDF-4 parallel I/O operations was restricted to HDF5 format. Starting from the release of 4.1, NetCDF has also incorporated PnetCDF library to enable parallel I/O operations on files in classic formats (CDF-1 and 2). Official support for the CDF-5 format started in the release of NetCDF 4.4.0.
Note NetCDF now can be built with PnetCDF as its sole parallel I/O mechanism by using command-line option "--disable-netcdf-4 --enable-pnetcdf". Certainly, NetCDF can also be built with both PnetCDF and PHDF5 enabled. In this case, a NetCDF program can choose either PnetCDF or Parallel HDF5 to carry out the parallel I/O by adding NC_MPIIO or NC_NETCDF4 respectively to the file open/create flag argument when calling API nc_create_par or nc_open_par. When using PnetCDF underneath, the files must be in the classic formats (CDF-1/2/5). Similarly for HDF5, the files must be in the HDF5 format (aka NetCDF-4 format). NetCDF-4 example programs are available to demonstrate such parallel I/O operations.
A Brief History of PnetCDF
The Parallel-netCDF project started in 2001, independently from the Unidata's NetCDF project. Applications can use PnetCDF library completely without NetCDF library. The initial goal of PnetCDF is to develop a parallel I/O library for applications to access CDF-1 and 2 formats on parallel computers. Its focus is to achieve high I/O performance. The design adopts a new set of APIs (with prefix name of "ncmpi_") due to its implementation being tightly coupled with MPI. To encourage adoption from NetCDF users, the syntax of PnetCDF APIs stays mostly the same as the NetCDF's. Through fully use of existing optimizations available in MPI-IO implementation, PnetCDF has been demonstrated to be able to deliver high-performance parallel I/O.
A Note About Large File Support
The classic CDF file format (referred as CDF-1 and now obsolete) has been in use by NetCDF library through version 3.5.1. The classic format has been updated by NASA ESDS community standard and added a support for 64-bit offset file format (also referred as CDF-2). See NetCDF Classic and 64-bit Offset File Formats.
Starting from 3.6.0, the serial NetCDF library added support for the CDF-2 format. With this format, even 32 bit platforms can create NetCDF files greater than 2GB in size. CDF-2 also allows more special characters in the name strings of defined dimension, variables, and attributes. The support was based largely on work from Greg Sjaardema.
Starting from the release of 0.9.2, PnetCDF supports CDF-2 format. See README.large_files for more information.
Starting from the release of 1.3.0, PnetCDF supports CDF-5 format, an extension of CDF-2 that adds unsigned and 64-bit integer data types and allows variables to be defined with more than 232 array elements.
File and Variable Limits
Both PnetCDF and NetCDF share limitations on file and variable sizes. More information can be found on the FileLimits page.
Required Software
PnetCDF requires an MPI implementation with MPI-IO support. Most MPI libraries have this nowadays. A parallel file system would also go a long way towards achieving highest performance.
Related Projects
PnetCDF makes use of several other technologies.
- ROMIO, an implementation of MPI-IO, provides optimized collective and noncontiguous operations. It also provides an abstract interface for a large number of parallel file systems.
- One of those file systems ROMIO supports is PVFS, a high performance parallel filesystem for linux clusters.
Today, there are several options for high level I/O libraries. Here are some discussions on the role of PnetCDF in this ecosystem:
Mailing List
We discuss the design and use of the PnetCDF library on the [email protected] mailing list. Anyone interested in developing or using PnetCDF is encouraged to join. Visit the list information page for details. This mailing list is also for announcements, bug reports, and questions about PnetCDF software.
The URL for the list archive is http://lists.mcs.anl.gov/pipermail/parallel-netcdf/. You can browse even older mailing list messages at the older mailing list archives
Project Members
- Rob Latham, Rob Ross, and Rajeev Thakur (Argonne National Lab)
- Wei-keng Liao and Alok Choudhary (Northwestern University)
- Seung Woo Son (formerly a postdoc at ANL, and then a postdoc at Northwestern, now an Assistant Professor at UMass Lowell)
- Kui Gao (formerly a postdoc at Northwestern, now Dassault Systèmes Simulia Corp.)
- Jianwei Li (Northwestern, graduated in 2006)
- Bill Gropp (formerly ANL, now UIUC)
Citations
When referring to the Parallel netCDF project, please use the following URLs:
- www.mcs.anl.gov/parallel-netcdf (the 'trac' or 'www-unix' URLs could change)
- http://cucis.ece.northwestern.edu/projects/PnetCDF/ (a page maintained by Northwestern University)
If you are looking for a reference to use in a published paper, please cite our SC2003 paper below.
- Jianwei Li, Wei-keng Liao, Alok Choudhary, Robert Ross, Rajeev Thakur, William Gropp, Rob Latham, Andrew Siegel, Brad Gallagher, and Michael Zingale. Parallel netCDF: A Scientific High-Performance I/O Interface. In the Proceedings of ACM/IEEE conference on Supercomputing, pp. 39, November, 2003.
Acknowledgements
Original Parallel netCDF development was sponsored by the Scientific Data Management Center (SDM) under the DOE program of Scientific Discovery through Advanced Computing (SciDAC). It was also supported in part by
- National Science Foundation under the SDCI HPC program award numbers OCI-0724599 and HECURA program award numbers CCF-0938000.
- Scientific Data, Analysis, and Visualization (SDAV) Institute under the DOE SciDAC program.
Ongoing maintenance is funded by the Exascale Computing Project (ECP) under the DOE Office of Science.