NetCDF-4 Dimensions and HDF5 Dimension Scales

Last time we looked in detail at the internals of dimension scales in an HDF5 file. Now let's see what netCDF-4 shared dimensions look like at the file format level. Recall that in the netCDF data model, one defines Dimension objects, and uses those objects to define the shapes of variables; so dimensions are not optional in netCDF. Let's use the following example to illustrate the different ways shared dimension are implemented using dimension scales:

   dimension:
      nvec = 3;
      time = 100;
      sample = 345;
      ship = 14;
      ship_strlen = 80;
   variable:
     float data(ship, sample, time, nvec);
     int time(time);
     int sample(time, sample);
     char ship (ship, ship_strlen);

Let's go through each dimension in this example:

  1. nvec is a dimension with no coordinate variable, used, perhaps, for a vector component.
  2. time is both a dimension and a coordinate variable.
  3. sample is a dimension and a data variable. It's not a coordinate variable because coordinate variables must be 1-dimensional, with one exception described next.
  4. ship is a dimension and a coordinate variable. It's a coordinate variable because:
    1. it's a char variable,
    2. it's two dimensional, and
    3. the inner dimension does not have a coordinate variable.
    That it is two-dimensional is really an artifact of the fact that the netCDF classic model doesn't have a string data type. In the extended model, one could use string ship(ship).
  5. ship_strlen is the string length of the ship variable. It really shouldn't be a shared dimension, but is an artifact of the fact that the netCDF data model does not have anonymous (ie non-shared) dimensions. I'm hoping that anonymous dimensions will be added to the extended netCDF model in the future. The CDM data model does have anonymous dimensions, in which case the CDL would look like: char ship(ship, 80). But the (variable length) string form is preferable if you aren't reading punch cards with Fortran 4.

We've already looked in detail at how dimension scales are represented by examining h5dump output. Now we look at the HDF5 objects resulting from the above example, but using a more compact notation:

 float nvec(3);
  :REFERENCE_LIST = null
  :CLASS = "DIMENSION_SCALE"
  :NAME = "This is a netCDF dimension but not a netCDF variable.         3"

float data(14,345,100,3);
  :DIMENSION_LIST = "ship", "sample", "time", "nvec"

int time(100);
  :CLASS = "DIMENSION_SCALE"
  :NAME = "time"
  :REFERENCE_LIST = null, null

float sample(345);
  :REFERENCE_LIST = null, null
  :CLASS = "DIMENSION_SCALE"
  :NAME = "This is a netCDF dimension but not a netCDF variable.       345"

int _nc4_non_coord_sample(100,345);
  :DIMENSION_LIST = "time", "sample"

char ship(14,80);
  :REFERENCE_LIST = null
  :CLASS = "DIMENSION_SCALE"
  :NAME = "ship"
  :_Netcdf4Coordinates = 3, 4

float ship_strlen(80);
  :CLASS = "DIMENSION_SCALE"
  :NAME = "This is a netCDF dimension but not a netCDF variable.        80"

(Note that the above is not CDL, but just a shorthand notation for the objects in the HDF5 files. Note also that the REFERENCE_LIST attributes above are not actually null, that's a limitation of my dump output. Since we don't need the contents I haven't bothered to show them.)

Looking at each one of these in turn:

  1. nvec is a dimension scale, because CLASS = "DIMENSION_SCALE". It defines the nvec dimension, but because there is no associated coordinate variable, the netCDF-4 library sets the attribute NAME to start with "This is a dimension scale but not a netCDF variable". This tells the library not to expose a dataset nvec as a netCDF variable.
  2. data is a data variable. The DIMENSION_LIST attribute unambiguously lists the dimensions that it uses.
  3. time is a dimension scale, and since it is also a coordinate variable, the library does expose the dataset time as a netCDF variable.
  4. sample is a dimension scale, but not a coordinate, so the NAME attribute starts with "This is a dimension scale but not a netCDF variable".
  5. sample is a data variable with a name that conflicts with a dimension name. The netCDF4 library modifies the HDF5 dataset name by prepending the string _nc4_non_coord_, and removes this string when constructing the netCDF variable sample.
  6. ship is a dimension and a 2D char coordinate variable. Since a 2D coordinate can only be a char coordinate, where the second dimension represents the string length, we know that the dimension that it represents must be the first one, with length 14. I'm not yet clear how the _Netcdf4Coordinates attribute is used, or if its needed.
  7. ship_strlen is a dimension scale but not a coordinate variable.

So there you have the sanctus sanctorum of netCDF-4 dimension scales, as far as I have explored. If there are no further questions, you may resume your grooming, hurling fruit, and shrieking from the treetops.

Comments:

Post a Comment:
Comments are closed for this entry.
Unidata Developer's Blog
A weblog about software development by Unidata developers*
Unidata Developer's Blog
A weblog about software development by Unidata developers*

Welcome

FAQs

News@Unidata blog

Take a poll!

What if we had an ongoing user poll in here?

Browse By Topic
Browse by Topic
« December 2019
SunMonTueWedThuFriSat
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    
       
Today