Discussion of NetCDF limitations

We have gotten NetCDF and the 2.3 document and have implemented a file reader
for our visualization package (SciAn), based on my understanding of the
documentation.  I found that it could do a few things, but that it couldn't do
a lot of things that I need from a self-descriptive scientific data format. 
Some of the things that it can't do could be fixed with ad-hoc attributes, but
that approach is only useful if one has complete control over both the writer
and the reader of the data.  So, I'd like to start a discussion first of all to
figure out if my interpretation is correct, and second to try to figure out
ways of improving it to do more things of interest.  I place a high value on
not saying "let the user determine it at run-time," because the more things
that can be self-descriptive, the better.

Here is my interpretation of what it does and doesn't do.  The does list is
not meant to be all-inclusive, just to include the things which I currently
think are important, which list is guaranteed to change over time.  :-)

NetCDF does:

A) Scalar fields with any number of fixed dimensions, anything that can be 
   written var = f(i, j, k, ...).  (The NetCDF calls this multidimensional
   data, but the fact is that there are three uses of the term "dimension"
   that are important here.  The first is topological or computational
   dimension, which is what "multidimensional" variables seem to do.  The
   second is degrees of freedom in the data points, which is like "at this
   point i, j, k it's a scalar giving density or it's a 3-vector giving 
   velocity.)

B) Rectilinear grids with separable axes, anything that can be written 
   x = f(i), y = g(k), ...  (The third meaning of "dimension" is spatial
   dimension.  There is a mapping between topological or computational
   dimensions and spatial dimensions.)

C) Missing data

D) Axis names and units, also names and units of scalar fields

E) Range of scalar values

F) Flexible time-dependency using one unlimited dimension

G) A variety of data formats including byte

NetCDF doesn't do:

1) Vector fields

2) Tensor fields

3) Handedness of coordinate system (i.e. lat, long, elev is left-handed)

4) Choice of mapping of dimensions onto spatial dimensions

5) Spatially transformed coordinate systems (i.e. polar coordinates, crystal
   axis coordinates).

6) Curvilinear coordinates (i.e. x = f(i, j, k, ...), y = g(i, j, k, ...)

7) Unstructured grids (e.g. finite element)

8) Scatter data

9) Modulo data elements (e.g. 357, 358, 359, 0, 1, 2)

10) Arbitrary, automatic mapping of byte elements onto real numbers.  This is
useful because some data formats, such as NEXRAD, give you 256 real numbers 
and then a bunch of bytes.  NetCDF does the bytes; what is needed is a way
to do the mapping.

Finally, here are the ways that we are approaching these problems (or not, as
the case may be).  I hope this can spark some discussion.  If you see (=HDF),
that means we're using basically the same strategy as we do for similar
limitations in HDF.

1) If the first or last non-unlimited dimension is 2 or 3, and there is no
scale dimension associated with it, assume that the dimension chooses the
vector component rather than an additional topological or computational
dimension.  Allow this to be defeated using a check box in the file reader
control panel.  (=HDF)

2) We don't do tensors yet, but if we did, it would be something like 1).

3) and 4) Use an external file to map particular dimension names onto spatial
dimensions (i.e. latitude = y, longitude = x for Mercator projection). 
Handedness is implicit by the axes chosen.  (=HDF)

5) This could be done by defining new attributes.  In HDF, it's done using
the coordinate system for easy stuff, but there is no way to do skewed axes.

6) I'm not sure how to do this.  However, a curvilinear grid can be defined
completely by a vector field defined over a set of topological or computational
dimensions.  Assuming a way to do this, all that is needed is a a way to link a
field with the vector field that defines the grid.  This could be done with an
attribute of the field giving the name of the grid field which could be
searched for in the file.  There's also the issue of mixed curvilinear
coordinates.  In meteorology we often get a case where x = f(i), y = g(j), and
z = h(i, j, k).  This is to do a grid that matches the terrain at the bottom
and is a flat elevation above sea level at the top.  This requires the
flexibility of a curvilinear grid but there are optimizations to make a search
of the grid as fast as a rectilinear grid.  It would be nice to take advantage
of this.  It would, at least, require different variables for the three spatial
dimensions and a way of linking the variables into one complete variable.

7) No idea.  The positions of numbered grid points could be represented easily
enough with a 1-D vector sequence, although this kind of variable definition
might confuse the heuristic in 1, if there were, for example, a grid with 3 
vertices.  However, the problem of how to do the connectivity for edges, 
faces, cells, and hypercells is open.

8) Essentially the same problem as a nonstructured grid with a topological
dimension of 0, i.e., no connectivity.

9) In HDF, we do the wrapping based on our knowledge of the coordinate system
and also the name of the units (degrees, radians, gradians).  In NetCDF we
could do the same on the unit names, but we wouldn't have the coordinate system
information to help the heuristic.

10) One way to do this is to do the mapping as just another variable with 
dimension 256 and find some attribute way of attaching the two together. 
Another way would be to extend NetCDF so that *a variable could be used as
a data type*.  For example, something like:

dimensions:
    x = 10, y = 10, funcMap = 256;
variables:
    float funcMap(funcMap);
    funcMap field(x, y);
data:
    funcMap = 1.0, 2.0, 3.0, ...;
    field = 4, 6, 8, 1, 37, ...;

I have no idea if this is possible to represent internally or not, but it
drops right out of the syntax of CDL.  It is, however, evil.

Eric Pepke                                     INTERNET: pepke@xxxxxxxxxxxx
Supercomputer Computations Research Institute  MFENET:   pepke@fsu
Florida State University                       SPAN:     scri::pepke
Tallahassee, FL 32306-4052                     BITNET:   pepke@fsu

Disclaimer: My employers seldom even LISTEN to my opinions.
Meta-disclaimer: Any society that needs disclaimers has too many lawyers.