Re: coordinate systems in netcdf (again)

Stephen.Walker@marine.csiro.au
Thu, 26 Jun 1997 16:10:41 +1000

Russ
 
Thanks for your comments on our proposal. Your message appears to
contain two main themes - bounding coordinates, and dimensional
attributes. Our comments on each of these are as follows:
 
Bounding coordinates:
 
You give the example of layers in the atmosphere, and the need to
store coordinates for the top and bottom of these layers. Along
the same lines, more generally, Gregory, Drach, and Tett say
in section 21 of their proposal
 
> NEW: Along a dimension, the values might relate to points (at the coordinate
> values) or to contiguous or non-contiguous cells. The boundaries of the
> cells should be defined as well as the cell coordinate values. The
> convention is to define an additional two-dimensional ``boundary coordinate
> variable'' with a left-hand dimension (trailing dimension in Fortran terms)
> of size two.
 
Their proposal and your example both only deal with the 1-dimensional
case. In two dimensions, a 'cell' will be defined by 4 points, and
in the general curvilinear case, each such point is specified by 2 coordinate
variable values (x and y, for example). In 3 dimensions, 8 points are needed
(defining the corners of a 'cube' for the want of a better word), although
particular cases (such as some model grids) might allow you to simplify this
(4 (x,y) points and 2 values of z, for example). In general, the specification
of the bounds of a 'cell' becomes quite messy for higher dimensions, and we
don't (yet) have a good, general proposal for addressing this problem. Our
model output files actually do store this information - we have 4 distinct
horizontal grids stored in the files, representing cell centres, cell corners,
and the centres of two adjacent faces, but at the moment, the intelligence
to interpret these is hard wired into our processing software.
 
Dimension attributes:
 
A possible drawback of our proposal is the need to maintain "coordinates"
attribute strings for each data variable, even when several data variables
have the same set of coordinate variables associated with them. Your proposal
is to eliminate this possible duplication by using global attributes having
dimension names, which list coordinate variables. As well as the drawbacks
you mention, we see several other problems with this approach:
 
Firstly, almost any variable might be considered to be a coordinate variable.
For example, given a file fragment as follows:
 
  dimensions:
      d1 = ...;
      d2 = ...;
      d3 = ...;
 
  variables:
      data1(d1,d2,d3);
      data2(d1,d2,d3);
 
      coord1(d1);
      coord2(d2,d3);
      coord3(d2,d3);
      coord4(d2,d3);
      coord5(d2,d3);
 
Your scheme would have global dimension attributes as follows:
 
     :d1 = "coord1";
     :d2 = "coord2 coord3 coord4 coord5";
     :d3 = "coord2 coord3 coord4 coord5";
 
Incidentally, this seems to me to be perfectly valid, but it violates
your requirement that:
 
>  No two tuples of coordinate variable values are the same for distinct
>  values of the dimension.
 
However the main point is that in some circumstances one may wish to consider
data2 as a coordinate variable for data1, or vice versa. In that case, the
global dimension attributes become:
 
    :d1 = "data1 data2 coord1";
    :d2 = "data1 data2 coord2 coord3 coord4 coord5";
    :d3 = "data1 data2 coord2 coord3 coord4 coord5";
 
This initially looks fine, but in fact it adds absolutely no information
to the file, as all it does is explicitly state the dimensional dependence
of each variable in the file - something that can already be found out by
(perhaps somewhat tedious) inspection of each variable.
 
So, if we do allow data variables to be coordinates for other data variables,
then your dimension attribute proposal adds no information at all. If we don't
allow this, then all it really does is to identify the set of variables which
we do consider to be coordinate variables. This could be done more clearly
by having a single global attribute, as follows:
 
    :coordinate_variables = "coord1 coord2 coord3 coord4 coord5";
 
This is quite like our original proposal, but avoids the problem of having
to maintain coordinate attributes for each data variable. Bindings between
coordinate variables and data variables must then be worked out on the
basis of which dimensions they have in common (keeping in mind that the
dimensions of a coordinate variable must always be a subset of the dimensions
of the associated data variable).  If this was adopted, someone should write
and disseminate a subroutine or set of routines which identify these bindings!
 
The main limitation of the above is that it allows less flexibility in
the association of data variables and coordinate variables. Using our
original proposal, for example, we could write:
 
    data1:coordinates = "coord1 coord2 coord3";
    data2:coordinates = "coord1 coord2 coord3 coord4 coord5";
 
signifying that coord4 and coord5 were appropriate coordinates for data2,
but not data1. The global attribute approach doesn't allow this, but that
may not be a big sacrifice in most applications. If great flexibility really
is required, perhaps a nested approach could be used, where the "coordinates"
attribute for a variable is used only if it is present, and otherwise a
global "coordinate_variables" attribute is used.
 
Thanks again for your comments, and we would welcome more on the material
above. We have also copied this message to Jonathan Gregory, and have also
had some correspondence with him on other aspects of our proposal. We have
not sent this message to the netCDF group, due to its length, but feel free
to forward it if you think that is appropriate.
 
Regards,
Stephen Walker
Jason Waring