Re: Preliminary HDF5 Dimension documents

Hi John,

> Hi Quincey, some thoughts on your proposal:
> 
> 1. A few notes on naming differences between the netCDF and HDF5 data model:
>    
>     A netCDF  *Variable* is a multidimensional array of primitive 
> values, roughly corresponding to a HDF5 *Dataset.*
    Yup.

>     A netCDF *Dimension *is a named array index. They are globally 
> scoped, so can be shared. A Variable specifies its dimensionality by 
> referencing a set of Dimensions, this set corresponds to an HDF5 
> *Dataspace. *There is no exact equivilence to a Dimension as i 
> understand it. The fact that Variables can share Dimensions adds an 
> important meaning to netCDF files.
    This document introduces dimensions as an optional method of composing
a dataspace in HDF5, so they ought to be completely analogous to netCDF
dimensions.
    One possible difference is that I wasn't planning on naming the dimensions
within a dataspace.  They were just going to be indexed by their rank within
the dataspace (i.e. the 0th dimension, the 1st dimension, etc).  This could
reference a named dimensions through an indirect dimension (see the shareability
document), but the actual dimensions in the dataspace weren't planned on having
names associated with them.
    Do you think this is an important requirement?  Does the netCDF API
require that the dimensions in a dataspace for a dataset have names, or
will having shared dimensions using the names of dimension objects in the
grouping hierarchy be sufficient?

>     A netCDF *Coordinate Variable* is a 1D Variable whose name matches 
> its dimension's name, and whose values are monotonic. This corresponds 
> to your proposed *Dimension Scale*. Note that a netCDF Dimension 
> describes array indices, whereas a Coordinate Variable / Dimension Scale 
> describe coordinates values assigned to each index of the corresponding 
> Dimension.
    Yes, I designed the new HDF5 Dimension Scale model to be compatible
with netCDF Coordinate Variables (ideally, Dimension Scales will be a superset 
of Coordinate Variables).  I'm still not totally pleased with the term "scale"
and somewhat lean toward using netCDF's "coordinates" term since that more
accurately describes their true meaning, but since HDF4 used "scale", I may end
up sticking with the term... :-/

> 2. So, generally I like your Dimension Scale proposal. The main things 
> we need are 1) shared Dimensions even when theres not a coordinate 
> variable (perhaps a Dimension Scale without the values?),
    Actually, the HDF5 Dimensions will be able to be shared by different
dataspaces without involving any Dimension Scales.

> 2) each Dimension Scale must have a name;
    Yes, that's the primary method of indexing them from a dimension.  I
imagine we may have an API function to get the n'th scale, but that's not
a requirement at this point.

> and 3) a Variable/Dataset can specify 
> its dimensionality/Dataspace by listing the Dimensions (or their names).
    I'm planning on adding API functions for "composing" a dataspace from
dimensions and then that "composed" dataspace could be used to create datasets.

> 3. While 1D Coordinate Variables / Dimension Scales are the common case, 
> there are also datasets that need different kinds of coordinate systems, 
> including multidimensional coordinate variables. I am eager that netCDF 
> / HDF5 can support these, but I think they can be built on top of the 
> current functionality, and so we can leave them out of this discussion 
> so as to keep things from getting too complicated. (for more details on 
> those ideas, see chapter 3.1 of the java-netcdf user manual).
    As I mentioned to Russ and Ed last week, I think that having support for
coordinate systems (I was calling them "multi-dimensional scales" at the time)
is an important feature to include.  I've printed the java-netcdf user
manual and will be using it for reference during further iterations on the
HDF5 dimension scale design to try to include this concept.  I imagine that I'll
associate them with the dataspace directly instead of hanging them off the
dimensions (since the dataspace can be multi-dimensional and the dimensions are
1-D by definition).

    Also, I was considering cutting the ability of dimensions to have multiple
scales associated with them (to simplify things), but glancing through the
java-netcdf information, it looks like that may be an important feature.
What's your opinion about how critical that is and how often it is used?

    Quincey