« HDF5 Dimension Scale... | Main | NetCDF-4 Dimensions... »

03 August 2012

In this post I will show how the HDF5 library implements dimension scales, and in the next post I will show how the netCDF-4 file format implements shared dimensions. We will look at the low-level objects stored in the file. The intention is to document these details for software that wants to read or write this information outside of the netCDF-4 C library; none of this is needed to use the HDF5 or netCDF-4 APIs.

Let's first look at the HDF5 API for dimension scales. To create a dimension scale, use H5DSset_scale:

herr_t H5DSset_scale(hid_t dsid, char *dimname)

The dataset dsid is converted to a Dimension Scale dataset, as defined above. This creates the CLASS attribute, set to the value "DIMENSION_SCALE" and an empty REFERENCE_LIST attribute, as described in "HDF5 Dimension Scale Specification" (PDF, see section 4.2). If dimname is specified, then an attribute called NAME is created, with the value dimname.

hid_t dsid;	IN: the dataset to be made a Dimension Scale
char dimname*;	IN: the dimension name (optional), NULL if the dimension has no name.

The core of the functionality is in the HDF5attach_scale method:

herr_t H5DSattach_scale(hid_t did, hid_t dsid, unsigned int idx)

Define Dimension Scale dsid to be associated with dimension idx of Dataset did. Entries are created in the DIMENSION_LIST and REFERENCE_LIST attributes, as defined in section 4.2.

hid_t did;	IN: the dataset
hid_t dsid;	IN: the scale to be attached
unsigned int idx;	IN: the dimension of did that dsid is associated with.

If you look at a dimension scale with h5dump, you see something like this:

DATASET "time" {
 1# DATATYPE  H5T_STD_I32LE
    DATASPACE  SIMPLE { ( 100 ) / ( 100 ) }
 2# ATTRIBUTE "CLASS" {
      DATATYPE  H5T_STRING {
         STRSIZE 16;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "DIMENSION_SCALE"
      }
   }
 3# ATTRIBUTE "NAME" {
      DATATYPE  H5T_STRING {
         STRSIZE 5;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "time"
      }
   }
 4# ATTRIBUTE "REFERENCE_LIST" {
      DATATYPE  H5T_COMPOUND {
         H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
         H5T_STD_I32LE "dimension";
      }
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      DATA {
      (0): {
            DATASET 546 /data ,
            2
         },
      (1): {
            DATASET 1405 /_nc4_non_coord_sample ,
            0
         }
      }
   }
}

where the N# above are annotations that I've added:

An integer dataset (aka variable) named "time" with 100 elements in it.
An attribute named CLASS with value "DIMENSION_SCALE".
An attribute named NAME with value "time".
An attribute named REFERENCE_LIST with value a compound type with two elements.

These three attributes are added to the dataset by the HDF5set_scale method, turning it into a dimension scale. The values in the REFERENCE_LIST attribute are added by two calls to the HDF5attach_scale method. The first attaches the dimension scale to the second dimension of the dataset named "/data", and the next attaches it to the zeroth dimension of the dataset named "/_nc4_non_coord_sample".

If we h5dump one of the datasets in REFERENCE_LIST, we see for example:

DATASET "_nc4_non_coord_sample" {
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 100, 345 ) / ( 100, 345 ) }
     ATTRIBUTE "DIMENSION_LIST" {      
      DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      DATA {
      (0): (DATASET 830 /time ), (DATASET 1114 /sample )
      }
   }

This tells us that the _nc4_non_coord_sample dataset has type integer and shape (100, 345). It has an attribute called DIMENSION_LIST whose value is an array of references to other datasets, namely, the "time" and the "sample" dataset. These are none other than our dimension scales. The DIMENSION_LIST attribute, like the REFERENCE_LIST attribute, is maintained by the HDF5 dimension scale API.

In summary, the HDF5 dimension scale API allows you to create associations between a specially marked dataset called a dimension scale, and one of the dimension of any other arbitrary dataset. Because there are no restrictions on this association, we can't rely on this raw interface alone for defining shared dimensions. But next time we will see how the netCDF-4 format builds on this interface to implement shared dimensions.

Next: NetCDF-4 Dimensions and HDF5 Dimension Scales

Posted by John Caron