Grid Conventions in ParCAL

There is so far no universally adopted standard for representing grids in climate and weather model output.

However there are a few widely used conventions we can assume in ParCAL.

CF Conventions

 CF website

The conventions for climate and forecast (CF) metadata are designed to promote the processing and sharing of files created with the NetCDF API. The CF conventions are increasingly gaining acceptance and have been adopted by a number of projects and groups as a primary standard. The conventions define metadata that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities.

ParCAL will be able to work with files using the CF standard. In particular, it will work with all files currently output by CESM.

Inquiring if a file follows CF

To find out if a given NetCDF file is using the CF standard, query the global attribute Conventions

Note that "Convention", "convention", and "conventions" are sometimes used and should also be tested for.

Example from the atmosphere model CAM (which uses the correct string):

// global attributes:
		:Conventions = "CF-1.0" ;

From the CESM ice model:

// global attributes:
		:title = "HRC06" ;
		:conventions = "CF-1.0" ;

From the ocean model POP:

// global attributes:
                :conventions = "CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netcdf/CF-current.htm" ;

From the land model CLM:

// global attributes:
		:conventions = "CF-1.0" ;

Coordinate Systems in CF-1.0

CF allows two methods of associating a variables dimensions with physical coordinates. See  this section for complete information.

Structured Grids

For structured grids, ParCAL can use the following:

All of a variable's dimensions that are latitude, longitude, vertical, or time dimensions (see Section 1.2, “Terminology”) must have corresponding coordinate variables, i.e., one-dimensional variables with the same name as the dimension (see examples in Chapter 4, Coordinate Types ). This is the only method of associating dimensions with coordinates that is supported by [COARDS].

An example of this convention from the atmosphere model CAM:

	float CMFDQ(time, lev, lat, lon) ;
		CMFDQ:units = "kg/kg/s" ;
		CMFDQ:long_name = "QV tendency - shallow convection" ;

(earlier in the file)
	double lat(lat) ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
	double lon(lon) ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
	double lev(lev) ;
		lev:long_name = "hybrid level at midpoints (1000*(A+B))" ;
		lev:units = "level" ;
		lev:positive = "down" ;
		lev:standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
		lev:formula_terms = "a: hyam b: hybm p0: P0 ps: PS" ;
	double time(time) ;
		time:long_name = "time" ;
		time:units = "days since 0001-01-01 00:00:00" ;
		time:calendar = "noleap" ;
		time:bounds = "time_bnds" ;

Semi-structured Grids

For Semi-structured (logically cartesian) grids like the POP grid, CF-1.0 says:

All of a variable's spatiotemporal dimensions that are not latitude, longitude, vertical, or time dimensions are required to be associated with the relevant latitude, longitude, vertical, or time coordinates via the new coordinates attribute of the variable. The value of the coordinates attribute is a blank separated list of the names of auxiliary coordinate variables.

An example from the ocean model POP

	float TEMP(time, z_t, nlat, nlon) ;
		TEMP:long_name = "Potential Temperature" ;
		TEMP:units = "degC" ;
		TEMP:coordinates = "TLONG TLAT z_t time" ;
		TEMP:cell_methods = "time: mean" ;
		TEMP:_FillValue = 9.96921e+36f ;
		TEMP:missing_value = 9.96921e+36f ;

(from earlier in the file:)
	double time(time) ;
		time:long_name = "time" ;
		time:units = "days since 0000-01-01 00:00:00" ;
		time:bounds = "time_bound" ;
		time:calendar = "noleap" ;
	float z_t(z_t) ;
		z_t:long_name = "depth from surface to midpoint of layer" ;
		z_t:units = "centimeters" ;
		z_t:positive = "down" ;
		z_t:valid_min = 500.f ;
		z_t:valid_max = 537500.f ;
	double TLONG(nlat, nlon) ;
		TLONG:long_name = "array of t-grid longitudes" ;
		TLONG:units = "degrees_east" ;
	double TLAT(nlat, nlon) ;
		TLAT:long_name = "array of t-grid latitudes" ;
		TLAT:units = "degrees_north" ;

The CF convection goes on to say:

An application that is trying to find the latitude coordinate of a variable should always look first to see if any of the variable's dimensions correspond to a latitude coordinate variable. If the latitude coordinate is not found this way, then the auxiliary coordinate variables listed by the coordinates attribute should be checked. Note that it is permissible, but optional, to list coordinate variables as well as auxiliary coordinate variables in the coordinates attribute.

The POP file above listed both auxiliary (TLAT, TLONG) and regular (z_lat, time) coordinate variables in the TEMP coordinate attribute.

NOTE: CF says there is no restriction on the order in which the auxiliary coordinate variables appear in the coordinates attribute string. However, CESM is following the convention that the auxiliary coordinate variables are listed in the reverse order of the variable's dimensions.

Also it is possible to examine the long_name attribute to see which variable is longitude and which is latitude.

Unstructured grids

CF has  conventions for describing cells and cell boundaries. If those attributes aren't present, the data can be assumed to represent the cell center but CF doesn't require this.

The GCRM adds attributes such as "coordinates_cells", "coordinates_corners", and "coordinates_edges". See an  example of the header from a GCRM netcdf file.

The output from the HOMME version of CAM (on the cubed sphere) does not follow CF conventions but can, as option, add the same metdata the GCRM uses (this option is to support Visit).

The output from HOMME is a 1D list of points for each horizontal level. The points are all vertexes of quads (no cell centered data).

dimensions:
	ncol = 777602 ;
	lev = 26 ;
	ilev = 27 ;
	time = UNLIMITED ; // (1 currently)

	float CMFDQ(time, lev, ncol) ;
		CMFDQ:units = "k" ;
		CMFDQ:long_name = "Q " ;
		CMFDQ:cell_methods = "time: mean" ;

	double lev(lev) ;
		lev:long_name = "hybrid level at midpoints (1000*(A+B))" ;
		lev:units = "level " ;
		lev:positive = "down" ;
		lev:standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
		lev:formula_terms = "a: hyam b: hybm p0: P0 ps: PS" ;
	double time(time) ;
		time:long_name = "time" ;
		time:units = "days since 0000-11-01 00:00:00" ;
		time:calendar = "noleap" ;
		time:bounds = "time_bnds" 
	double lat(ncol) ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
	double lon(ncol) ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;

Gridspec

GridSpec is a proposed standard still under development. ParCAL will provide partial support for GridSpec-conformant data files.

 Main GridSpec page

See also Bulaji's  discussion page. Parvis is looking for data files written with GridSpec for testing.

The TechX group is working on  extensions to CF that are based on GridSpec as part of its  MODAVE project:

With the emergence of mosaics, it is no longer feasible to store a variable in a single netCDF file. Typically, variables exist on each logically rectangular grid (tile) of the mosaic assembly with data on each tile stored in separate files. A similar problem arises with CMIP-5 data, which allows data and grid information to be stored in separate files. These two requirements presently break the CF metadata conventions, which demand that all information be stored in self-contained files. Both of these requirements are addressed in this section. Specifically, we describe extensions to the existing CF conventions that will allow data producers to:

Provide a single view of data scattered over multiple logically rectangular grids. This is referred to as the M-SPEC below.

Distribute data among multiple files, allowing grid information, time independent data, and data time-slices to be stored in separate files, on a per tile basis. One file (the host file) will provide a single entry point for consumers to access data stored on multiple tiles, and within tiles stored in multiple files. This process is referred to as file aggregation and is described below in the F-SPEC section.

The extensions are described using CDL and are compatible with netCDF3. The proposed extensions are backward compatible with present CF usage (when there is no file aggregation and no need for a mosaic file).

More on the  MODAVE wiki