Re: coordinate systems in netcdf (again)
Tue, 17 Jun 1997 11:48:30 +1000

We are (some of the) numerical modellers at CSIRO Division of Marine
Research in Hobart, Australia, and have been following the recent (and
historical) thread regarding coordinate conventions with much interest.
The model we use here for coastal and estuarine work uses curvilinear
coordinates and stores data in netCDF files. For years we have done this
in an ad-hoc way, using neither conventional coordinate variables nor
referential attributes, but rather depending on hard-wired intelligence
in our processing and plotting software. Some sort of convention for
representing curvilinear grids which is compatible with the wider
community would clearly be of great benefit.
Recently, Russ Rew posted a draft document by Jonathan Gregory, Bob
Drach and Simon Tett, essentially describing extensions to the COARDS
conventions. While it is clear that much thought and work has gone into
this document, we feel it is too specific, or too 'high level' for our
needs. For example, it essentially perpetuates the idea of coordinate
variables as having only 1 dimension, and accommodates 'rotated' grids by
specifying the position of a shifted North Pole. Neither of these concepts
is useful to those of us who use more general curvilinear grids, or who
use grids which are not defined in lon,lat space. We feel that some lower
level, more generic conventions may be of use to a wider community.
Below are our (rather long) ideas on the subject, based largely on our
reading of the very useful archive of postings maintained by Russ Rew.
In the general case, a netCDF file is capable of storing multi-dimensional
arrays of data. A general file fragment might look like this:
        d1 = size1;   // or perhaps UNLIMITED
        d2 = size2;
        d3 = size3;
        float data(d1, d2, d3, ...);
The usual situation is that, for each data value at a given position
defined by the indices (d1, d2, d3, ...), you want to be able to associate
or evaluate a number of other quantities (which we will call coordinates).
So, the netCDF file will contain a number of other variables which store
these 'coordinate' values. Here are some very general examples of such
coordinate variables, with comments interspersed:
Example 1:
        float d1(d1);
        float d2(d2);
These are examples of the 'classic' 1-dimensional coordinate variables which
conform to the existing netCDF conventions.
Example 2:
        float d1(d1, d2, d3, ...);
        float d2(d1, d2, d3, ...);
These are examples of the multi-dimensional extension to the coordinate
variable convention, proposed by a number of people. As in Example 1 above,
each coordinate variable here has a name which is the same as one of the
dimension names.
Example 3:
        float coord1(d1);
        float coord2(d1, d2, d3, ...);
        float coord3(d1, d3);
        float coord4(d3);
        float coord5(d3, d4, ...);
This is a much more general example. Note that the variable names are
not necessarily the same as the dimension names, and that different
coordinate variables might have different numbers of dimensions, although
their dimensions must always be a subset of the dimensions present in the
associated data variable/s. Note, however, that this example does not (yet)
specify any way of associating these coordinate variables with the data
variable. This, by the way, is the current state of our model output files.
The problem with example 1 is that it excludes the easy representation of
many types of coordinate 'grids', as previously discussed by many people.
It serves the purpose well only when there is a 1 to 1 mapping between data
dimensions and coordinate quantities, and when each coordinate is a function
of only one data dimension.
Example 2 generalises the concept of a coordinate variable in a fairly
natural way. Some have commented that it violates the existing 1-d
convention, while others state that this need only be the case when
the data is such that the existing 1-d convention is inadequate in any case.
More seriously, the problem with example 2 is that it doesn't allow you
to have more coordinate variables than there are dimensions, and a number
of people have discussed this issue with regard to time coordinates or
vertical coordinates (or even spatial coordinates - see the posting by
Rich Signell in October 1992). From a purely mathematical (and esthetic)
point of view, we also find the implied statement that d1, for example,
depends on things other than d1, is confusing and illogical. There is a real
temptation here to confuse the role of data dimensions and coordinates.
In this situation it is important to note that there is no longer a 1 to 1
mapping between them.
The problem with example 3 is that if you want the dataset to be self
describing then you need some further mechanism to identify the association
between data variables and coordinate variables. A number of people have
identified referential attributes as the solution to this problem.
Our approach:
The above problems may be severe, mild, or irrelevant, depending on your
particular application.  We favour the third approach above for the following
  - We use curvilinear grids, so example 1 is not really useful.
  - We store coordinates in various projection spaces, so that we
    need more coordinate variables than we have dimensions. This
    makes example 2 of fairly limited use.
Proposal 1:
Our first proposal is for a low-level, general way to specify associations
between data variables and coordinates in a netCDF file:
  Each data variable has an attribute called 'coordinates' which lists
  the coordinate variables associated with that data variable. Each
  coordinate variable has dimensions which are a subset of the dimensions
  of the associated data variable(s).
The proposed netCDF file fragment then looks like this:
        d1 = size1;
        d2 = size2;
        d3 = size3;
        float data(d1, d2, d3, ...);
            data:coordinates = "coord1 coord2 coord3 coord4 coord5";
            // probably other attributes here as well
        float coord1(d1);
        float coord2(d1, d2, d3, ...);
        float coord3(d1, d3);
        float coord4(d3);
        float coord5(d3, d4, ...);
Note that this is essentially identical to the 'independent_variables'
attribute proposed by Rich Signell in 1992, and also similar suggestions
by others since then. Note also that this approach is compatible in principle
with the existing conventions, and with the multi-dimensional coordinate
variable proposals. It merely adds a single extra attribute per data variable.
People can still use dimension names as coordinate variable names if they want
to, and they can still have 1-dimensional coordinate variables if their data
grids warrant it.  We stress that this is a generic, low level, and very
general proposal. As well, we are happy to leave the details of things
like whether to separate names by commas or white space to future debate.
But how do I use this!
The main thing missing from the above proposal is that it does not
address the issues of how a 'generic' netCFD application is supposed to
handle coordinate variables once they have been identified. To find this extra
information, it is natural to use the attributes of the coordinate variables
themselves. There are already very useful conventions which may help here.
For example, in a 'well behaved' netCDF file, each coordinate
variable would have long_name and units  attributes, which a generic
application could use (we have not shown such attributes in any examples
above for the sake of brevity and clarity). So, for example, an application
which was expecting to find latitude and longitude values could examine
the units of each of the coordinate variables, hoping to find strings
like "degrees_east" and "degrees_north". It may be necessary to add further
helpful information which is not covered by current attribute conventions,
so that, for example, each coordinate variable might have a 'quantity'
attribute (as previously suggested), or 'coordinate_type' attribute.
We would welcome suggestions on this point.
As a final concrete example, we show below a file fragment which uses the
above ideas (including an extra attribute for coordinate variables)
to describe salinity output from our model:
        n = UNLIMITED
        k = 10;
        j = 100;
        i = 100;
        double t(n);
            t:long_name = "Time";
            t:units = "seconds since 1990-01-01 00:00:00 +10";
            t:coordinate_type = "time";
        double cell_z(n,k,j,i);
            cell_z:long_name = "Z coordinate at cell centres";
            cell_z:units = "metres";
            cell_z:coordinate_type = "height";
        double cell_y(j,i);
            cell_y:long_name = "Y coordinate at cell centres";
            cell_y:units = "metres";
            cell_y:coordinate_type = "Y, projection=AMG_zone_55";
        double cell_x(j,i);
            cell_x:long_name = "X coordinate at cell centres";
            cell_x:units = "metres";
            cell_x:coordinate_type = "X, projection=AMG_zone_55";
        double cell_lat(j,i);
            cell_lat:long_name = "Latitude at cell centres";
            cell_lat:units = "degrees_north";
            cell_lat:coordinate_type = "latitude";
        double cell_lon(j,i);
            cell_lon:long_name = "Longitude at cell centres";
            cell_lon:units = "degrees_east";
            cell_lon:coordinate_type = "longitude";
        double salt(n,k,j,i);
            salt:long_name = "Salinity";
            salt:units = "1";
            salt:coordinates = "t cell_z cell_y cell_x cell_lat cell_lon";

Any comments on any of the above would be most welcome.
Stephen Walker
Jason Waring
CSIRO Marine Research                   Fax: 03 6232 5123
GPO Box 1538, Hobart                  Phone: 03 6232 5298