Re: Preliminary HDF5 Dimension documents

Hi John,

> >>Hi Quincey, some thoughts on your proposal:
> >>
> >>1. A few notes on naming differences between the netCDF and HDF5 data model:
> >>   
> >>    A netCDF *Dimension *is a named array index. They are globally 
> >>scoped, so can be shared. A Variable specifies its dimensionality by 
> >>referencing a set of Dimensions, this set corresponds to an HDF5 
> >>*Dataspace. *There is no exact equivilence to a Dimension as i 
> >>understand it. The fact that Variables can share Dimensions adds an 
> >>important meaning to netCDF files.
> >>    
> >    One possible difference is that I wasn't planning on naming the 
> > dimensions
> >within a dataspace.  They were just going to be indexed by their rank within
> >the dataspace (i.e. the 0th dimension, the 1st dimension, etc).  This could
> >reference a named dimensions through an indirect dimension (see the 
> >shareability
> >document), but the actual dimensions in the dataspace weren't planned on 
> >having
> >names associated with them.
> >
> only shared dimensions need be named.
    Hmm, what about the netCDF API function that needs a name to get a dimension
ID?

> >    Do you think this is an important requirement?  Does the netCDF API
> >require that the dimensions in a dataspace for a dataset have names, or
> >will having shared dimensions using the names of dimension objects in the
> >grouping hierarchy be sufficient?
> >
> netcdf only has shared dimensions, so they are always named.
    Hmm, I think this will mean that each dataspace must be able to associate
a name with a particular dimension.

> >>3. While 1D Coordinate Variables / Dimension Scales are the common case, 
> >>there are also datasets that need different kinds of coordinate systems, 
> >>including multidimensional coordinate variables. I am eager that netCDF 
> >>/ HDF5 can support these, but I think they can be built on top of the 
> >>current functionality, and so we can leave them out of this discussion 
> >>so as to keep things from getting too complicated. (for more details on 
> >>those ideas, see chapter 3.1 of the java-netcdf user manual).
> >>    
> >>
> >    As I mentioned to Russ and Ed last week, I think that having support for
> >coordinate systems (I was calling them "multi-dimensional scales" at the 
> >time)
> >is an important feature to include.  I've printed the java-netcdf user
> >manual and will be using it for reference during further iterations on the
> >HDF5 dimension scale design to try to include this concept.  I imagine that 
> >I'll
> >associate them with the dataspace directly instead of hanging them off the
> >dimensions (since the dataspace can be multi-dimensional and the dimensions 
> >are
> >1-D by definition).
> >
> >    Also, I was considering cutting the ability of dimensions to have 
> > multiple
> >scales associated with them (to simplify things), but glancing through the
> >java-netcdf information, it looks like that may be an important feature.
> >What's your opinion about how critical that is and how often it is used?
> >
> >    Quincey
> >  
> i think there are 2 interesting examples if you try to handle coordinate 
> systems in a general way:
> 
> 1. float lat(x,y) and float lon(x,y) assign latitude and longitude 
> coordinates to points on a projection plane. this is the 
> "multidimensional case"
> 
> 2.  lat(sample), lon(sample), altitude(sample) might be a coordinate 
> system for variable O3(sample). this is the "1D trajectory" case.
> 
> So, what i came up with is that a coordinate system for a 
> variable/dataset is a collection of "coordinate axes" which can have any 
> dimensionality, but whose dimensions must all appear in the set of 
> dimensions used by the variable/dataset. Adding this info to the 
> dataspace is exactly right.
    Good. :-)

> Because the common case is that all or most of the variables/datasets in 
> a file use the same coordinate system, its nice to factor this 
> information out. So if the dataspace can be shared and the coordinate 
> system can be associated with the dataspace, that would be party time 
> most excellent.
    Dataspaces, dimensions and scales can all be shared, so we should be
rockin' :-)

> BTW, a mathematical formulation behind this (a little out of date but 
> useful if you like formalisms) is at
>     http://www.unidata.ucar.edu/staff/caron/papers/CoordMath.htm
    Already have it - it's been very useful as a more formal perspective on the
issues.. ;-)

> theres still one piece that you *might* want to tackle. the above is a 
> framework for general coordinate systems. our users generally want 
> georeferencing coordinate systems. this involves identifying which of 
> the coordinate axes correspond to the x,y,z, and t coordinates. this can 
> be a big can of worms, eg is youve ever looked at GIS specs, they are 
> complex. We have developed a set of very simple specs that so far have 
> satisfied most of our datasets, using "attribute conventions" outside 
> any explicit library support. I can understand if you dont want to add 
> any more complications. However I will say that IMHO getting 
> georeferencing coodinate systems clearly specified (ie not having to use 
> attribute Conventions) would be a huge win for our communities, and one 
> thats really doable.
    Can you explain the primary issues & difficulties you've encountered?  How
have you implemented them in netCDF?  Are there any documents on the web?  I'll
double-check with Mike Folk and Bob McGrath, since this may already be at the
edge of our mission, but if it's extremely valuable (and it sounds like it is),
we should definitely consider it.

    I mentioned earlier about having multiple scales for a dimension - do you
think this is useful?  Why would a user want to have "meters" and "months"
both apply to the "tick marks" for a dimension?

    Quincey