Writing NetCDF Files: Best Practices

Conventions

While netCDF is intended for "self-documenting data", it is often necessary for data writers and readers to agree upon attribute conventions and representations for discipline-specific data structures. These agreements are written up as human readable documents called netCDF conventions.

Coordinate Systems

A coordinate variable is a one-dimensional variable with the same name as a dimension, which names the coordinate values of the dimension. It must not have any missing data (for example, no _FillValue or missing_value attributes) and must be strictly monotonic (values increasing or decreasing). A two-dimensional variable of type char is a string-valued coordinate variable if it has the same name as its first dimension, e.g.: char time( time, time_len); all of its strings must be unique. A variable's coordinate system is the set of coordinate variables used by the variable. Coordinates that refer to physical space are called spatial coordinates, ones that refer to physical time are called time coordinates, ones that refer to either physical space or time are called spatio-temporal coordinates.

Variable Grouping

You may structure the data in a netCDF file in different ways, for example putting related parameters into a single variable by adding an extra dimension. Standard visualization and analysis software may have trouble breaking that data out, however. On the other extreme, it is possible to create different variables e.g. for different vertical levels of the same parameter. However, standard visualization and analysis software may have trouble grouping that data back together. Here are some guidelines for deciding how to group your data into variables:

Variable Attributes

Strings and Variables of type char

NetCDF-3 does not have a primitive String type, but does have arrays of type char, which are 8 bits in size. The main difference is that Strings are variable length arrays of chars, while char arrays are fixed length. Software written in C usually depends on Strings being zero terminated, while software in Fortran and Java do not. Both C (nc_get_vara_text) and Java (ArrayChar.getString) libraries have convenience routines that read char arrays and convert to Strings.

Calendar Date/Time

Time as a fundamental unit means a time interval, measured in seconds. A Calendar date/time is a specific instance in real, physical time. Dates are specified as an interval from some reference time e.g. "days elapsed since Greenwich mean noon on 1 January 4713 BCE". The reference time implies a system of counting time called a calendar (e.g. Gregorian calendar) and a textual representation (e.g. ISO 8601).

There are two strategies for storing a date/time into a netCDF variable. One is to encode it as a numeric value and a unit that includes the reference time, e.g. "seconds since 2001-1-1 0:0:0" or"days since 2001-1-1 0:0:0" . The other is to store it as a String using a standard encoding and Calendar. The former is more compact if you have more than one date, and makes it easier to compute intervals between two dates.

Unidata's udunits package provides a convenient way to implement the first strategy. It uses the ISO 8601 encoding and a hybrid Gregorian/Julian calendar, but udunits does not support use of other Calendars or encodings for the reference time. However the ncdump "-T" option can display numeric times that use udunits (and optionally climate calendars) as ISO 8601 strings that are easy for humans to interpret.

Unsigned Data

NetCDF-3 does not have unsigned integer primitive types.

Packed Data Values

Packed data is stored in a netCDF file by limiting precision and using a smaller data type than the original data, for example, packing double-precision (64-bit) values into short (16-bit) integers. The C-based netCDF libraries do not do the packing and unpacking. (The netCDF Java library will do automatic unpacking when the VariableEnhanced Interface is used. For details see EnhancedScaleMissing).

Depending on whether the packed data values are intended to be interpreted by the reader as signed or unsigned integers, there are alternative ways for the data provider to compute the scale_factor and add_offset attributes. In either case, the formulas above apply for unpacking and packing the data.

A conventional way to indicate whether a byte, short, or int variable is meant to be interpreted as unsigned, even for the netCDF-3 classic model that has no external unsigned integer type, is by providing the special variable attribute _Unsigned with value "true". However, most existing data for which packed values are intended to be interpreted as unsigned are stored without this attribute, so readers must be aware of packing assumptions in this case. In the enhanced netCDF-4 data model, packed integers may be declared to be of the appropriate unsigned type.

Let n be the number of bits in the packed type, and assume dataMin and dataMax are the minimum and maximum values that will be used for a variable to be packed.

Missing Data Values

Missing data is a general name for data values that are invalid, never written, or missing. The netCDF library itself does not handle these values in any special way, except that the value of a _FillValue attribute, if any, is used in pre-filling unwritten data. (The Java-netCDF library will assist in recognizing these values when reading, see class VariableStandardized).

Miscellaneous tips

Spelling "netCDF": Best Practices

There are only 3 correct spellings of "netCDF":

  1. netCDF: The original spelling of the name of the data model, API, and format. The acronym stands for network Common Data Form (not Format), and the "CDF" part was capitalized in part to pay homage to the NASA "CDF" data model which the netCDF data model extended.
  2. netcdf: Used in certain file names, such as:
    	 #include <netcdf.h> 	
  3. NetCDF: Used in titles and at the beginning of sentences, where "netCDF" is awkward or violates style guidelines.

All other forms, and most especially "Netcdf", are considered vulgar and a sign of ill-breeding or misspent youth, analogous to the egregious but common misspelling "JAVA" used by those who are new to the language or who mistakenly think it is an acronym.