[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 950717: Multiple Observations; Structures



Hi Roy,

> I know the following two points have been discussed before, but I don't
> remember a satisfactory answer and was wondering if anyone had made some
> progress on these fronts.  I am posting this to HDF also, as I am facing
> similar problems there and would appreciate any possible solutions for HDF
> files also.
> 
> 1). Some datasets I would like to put into NetCDF/HDF contain multiple
> observations at a given lat, lon,time, depth dimension.  The question is how
> to put these into the format.  One solution suggested is to have the
> observations as the unlimited dimension.  This has several drawbacks.  The
> first is that the dataset can not be expanded along the time axis as more
> observations are obtained.  The second is that the size of the array ends up
> being dimensioned by the location with the most number of observations.
> This makes the array much bigger than it needs to be, often astoundingly so
> (for example, if there is buoy data there are a lot of observations, and
> those locations will set the dimensioning for the entire array.

I don't think my ideas are well-enough cooked yet to post a reply to all of
netcdfgroup, so I'm just replying to you and CC:ing a few other people
familiar with these suggested conventions.

One solution is to use an artificial "obs" unlimited dimension for each
observation, and invent a convention for representing the information that
each (lat, lon, time, depth) tuple uniquely determines the value of the
"obs" dimension, e.g.  with a global dimension attribute for "obs" or with a
multidimensional coordinate variable for "obs".  

In the first case (a global dimension attribute), your structure in CDL
would look something like this:

 dimensions:
        obs = UNLIMITED;
 variables:
        float lat(obs);
            lat:units = "degrees_north";
        float lon(obs);
            lon:units = "degreees_east";
        float depth(obs);
            depth:units = "meters";
        double time(obs);
            time:units = "seconds since 1995-01-01";

 // the following "global dimension attribute" means
 // a (lat,lon,time,depth) tuple uniquely determines obs
         :obs = "lat lon depth time"

        float salinity(obs);
        float temperature(obs);
        ...

A second solution (using a multidimensional coordinate variable), results in
a CDL structure something like:

 dimensions:
        obs = UNLIMITED;
 variables:
        double obs(obs,4);
            obs:coords = "lat, lon, depth, time";
            obs:units = "degrees_north, degreees_east, meters, seconds since 
1995-01-01";
        ...
        float salinity(obs);
        float temperature(obs);
        ...

In the second case, you are using a single matrix to represent all the
coordinates, so you need to use the biggest type (in this case double).
There is also no good place to represent the units or other attributes for
each of the four real coordinates, so something like the list of units
kludge is needed.

Both of these "solutions" depend on conventions not supported by the netCDF
library or other applications, and each has advantages over the other.  In
either case you may save space because you get the effect of multiple
unlimited dimensions for lat, lon, depth, and time, but at the cost of
supporting an efficient mapping between (lat,lon,depth,time) tuples and
values of the "obs" unlimited dimension.  Also, if you have many
observations at the same (lat,lon) location, for example, you end up storing
the same values of (lat,lon) many times, once for each such observation.

> 2) Is there any way to create an array of structures, so that at each
> lat,lon,depth,time instead of pointing to a number, I point to a structure?
> This would be helpful when you want to put packed binary records of a fixed
> length into NetCDF.  The length wouldn't be limited by the length of a word
> on the machine (such as if I wanted to have a CMR5 record as the
> "observation" at a particular location.

We have some plans to support "nested arrays" in a future release of netCDF,
which would support both structures and ragged arrays.  We worked out an
implementation strategy and an interface for adding ragged arrays to netCDF
with Harvey Davies of CSIRO when he was here for a visit last month.  The
main obstacles to implementing these ideas are getting resources for the
necessary development.

--Russ

______________________________________________________________________________

Russ Rew                                           UCAR Unidata Program
address@hidden                              http://www.unidata.ucar.edu