Writing NetCDF Files: Best Practices
While netCDF is intended for "self-documenting data", it is often necessary for
data writers and readers to agree upon attribute conventions and representations for
discipline-specific
data structures. These agreements are written up as human readable documents called
netCDF conventions.
- Use an existing Convention if possible. See the list of registered
conventions.
- The CF Conventions are recommended where applicable, especially for gridded (model)
datasets.
- Document the convention you are using by adding the global attribute "Conventions"
to each netCDF file, for example:
:Conventions = "CF-1.0";
A coordinate variable is a one-dimensional variable with the
same name as a dimension, which names the coordinate values of the dimension.
It must not have any missing data (for example, no _FillValue or
missing_value attributes) and must be strictly monotonic (values
increasing or decreasing). A two-dimensional variable of type char is a string-valued
coordinate variable if it has the same name as its first dimension,
e.g.: char time( time, time_len); all of its strings must be unique. A
variable's coordinate system is the set of coordinate variables
used by the variable. Coordinates that refer to physical space are called spatial
coordinates, ones that refer to physical time are called time
coordinates, ones that refer to either physical space or time are called
spatio-temporal coordinates.
- Make coordinate variables for every dimension possible (except for string
length dimensions).
- Give each coordinate variable at least
unit and long_name
attributes to document its meaning.
- Use an existing netCDF Convention for
your coordinate variables, especially to identify spatio-temporal coordinates.
- Use shared dimensions to indicate that two variables use the same coordinates
along that dimension. If two variables' dimensions are not related, create
separate dimensions for them, even if they happen to have the same length.
You may structure the data in a netCDF file in different ways, for example
putting related parameters into a single variable by adding an extra dimension.
Standard visualization and analysis software may have trouble breaking that
data out, however. On the other extreme, it is possible to create different
variables e.g. for different vertical levels of the same parameter. However,
standard visualization and analysis software may have trouble grouping that
data back together. Here are some guidelines for deciding how to group your
data into variables:
- All of the data in a variable must be of the same type and should have the
same units of measurement.
-
A variable's attributes should be applicable to all its data.
- If possible, all of the coordinate variables should be spatio-temporal,
with no extra dimensions.
- Use 4D spatio-temporal coordinate systems in preference to 3D. Use 3D spatio-temporal
coordinate systems in preference to 2D.
- Vector valued (e.g. wind) parameters are legitimate uses of extra dimensions.
There are trade-offs between putting vectors in the same variables vs. putting
each component of a vector in a different variable. Check that any visualization
software you plan to use can deal with the structure you choose.
- Think in terms of complete coordinate systems (especially spatio-temporal),
and organize your data into variables accordingly. Variables with the same
coordinate system implicitly form a group.
- For each variable where it makes sense, add a units attribute, using
the udunits
conventions, if possible.
- For each variable where it makes sense, add a long_name attribute,
which is a human-readable descriptive name for the variable. This could be
used for labeling plots, for example.
NetCDF-3 does not have a primitive String type, but does have arrays of
type char, which are 8 bits in size. The main difference is that Strings
are variable length arrays of chars, while char arrays are fixed length. Software
written in C usually depends on Strings being zero terminated, while software
in Fortran and Java do not. Both C (nc_get_vara_text) and
Java (ArrayChar.getString)
libraries have convenience routines that read char arrays and convert to Strings.
- Do not use char type variables for numeric data, use byte type variables
instead.
- Consider using a global Attribute instead of a Variable to store
a String applicable to the whole dataset.
- When you want to store arrays of Strings, use a multidimensional char array.
All of the Strings will be the same length.
- There are 3 strategies for writing variable length Strings and zero-byte
termination:
- Fortran convention: pad with blanks and never terminate with
a zero byte.
- C convention: pad with zeros and always terminate with a zero
byte.
- Java convention: You don't need to store a trailing zero byte,
but pad trailing unused characters with zero bytes.
- When reading, trim zeros and blanks from the end of the char array and
if in C, add a zero byte terminator.
Time as a fundamental unit means a time interval, measured in seconds. A Calendar
date/time is a specific instance in real, physical time. Dates are specified
as an interval from some reference time e.g. "days elapsed
since Greenwich mean noon on 1 January 4713 BCE". The reference time implies
a system of counting time called a calendar (e.g. Gregorian calendar)
and a textual representation (e.g. ISO
8601).
There are two strategies for storing a date/time into a netCDF variable. One
is to store it as a String using a standard encoding and Calendar. The other
is to encode it as a numeric value and a unit that includes the reference time,
e.g. "seconds since 1992-10-8 15:15:42.5 -6:00". The latter is more
compact if you have more than one date, and makes it easier to compute intervals
between two dates.
Unidata's udunits
package provides a convenient way to implement the second strategy. It uses
the ISO 8601 encoding and a hybrid Gregorian/Julian calendar. It does not allow
you to use other Calendars or encodings for the reference time.
- If your data uses real, physical time that is well represented using the
Gregorian/Julian calendar, encode it as an interval from a reference time,
and add a units attribute which uses a udunits-compatible time unit. Readers
can use the udunits package to manipulate or format the date values.
- If your data uses a different calendar, use an existing Convention such
as the CF convention, or create a new Convention which clearly documents what
the calendar and encoding is. Make it compatible with existing date manipulation
packages if possible (e.g. java.text.SimpleDate).
- Add multiple sets of time encodings if necessary to allow different readers
to work as well as possible.
NetCDF-3 does not have unsigned integer primitive types.
- To be completely safe with unknown readers, widen the data type, or use
floating point.
- You can use the corresponding signed types to store unsigned
data only if all client programs know how to interpret this correctly.
Packed data is stored in a netCDF file using a smaller data type than the original
data, for example, packing doubles into shorts. The netCDF library itself does
not do the packing and unpacking. (The netCDF Java library will do automatic unpacking when the VariableEnhanced
Interface is used. For details see EnhancedScaleMissing).
- For each variable with packed data, add two attributes called scale_factor
and add_offset, such that
unpacked_data_value = packed_data_value * scale_factor + add_offset
- The type of the stored variable is the type of the packed data type, typically
byte, short or int.
- The type of the scale_factor and add_offset attributes should be the type
that you want the unpacked data to be, typically float or double.
- To compute the scale and offset for maximum precision packing of a set of
numbers, use:
- add_offset = dataMin
- scale_factor =(dataMax - dataMin) / (2^n - 1), where n is the number
of bits of the packed (integer) data type.
Note: In an earlier version of this page, the scale_factor
was incorrectly given as the reciprocal of the above formula.
- To avoid introducing a bias into the unpacked values due to
truncation when packing, round to the nearest integer rather than
just truncating towards zero:
packed_data_value = nint((unpacked_data_value - add_offset) / scale_factor)
- The precision of the data will be 1.0 / scale_factor.
- Example, packing 32-bit floats into 16-bit shorts:
variables:
short data( z, y, x);
data:scale_offset = 34.02f;
data:add_offset = 1.54f;
- The units attribute applies to unpacked values.
Missing data is a general name for data values that are invalid,
never written, or missing. The netCDF library itself does not handle these values
in any special way, except that the value of a _FillValue attribute,
if any, is used in pre-filling unwritten data. (The Java-netCDF library will
assist in recognizing these values when reading, see class VariableStandardized).
- To define a file whose structure is known in advance, write a CDL file and
create the netCDF file using ncgen.
Then write the data into the netCDF file using your program. This is typically
much easier than programming all of the create calls yourself.
- It's possible to reserve extra space in a netCDF file when it is created
so that you can later add additional attributes or non-record variables without
copying all the data. See the C man-page reference documentation (or the Fortran
reference documentation) for
nc__create and nc__enddef
(nf__create and nf__enddef for Fortran) for more
details.