Re: CDF, netCDF and HDF Note from you attached below


> == Glenn Davies
> > == Lloyd Treinish

> > On the other hand, deleting an
> > instance (i.e., a record in the conceptual equivalent in the CDF parlance) 
> > of
> > a variable would also change the shape. Is this supported in netCDF without
> > copying?  If so, is it done by just tagging the offending element?  Is any
> > space compression/garbage collection done after repeated such operations
> > because of the potential of wasted space? 
> 
> Of course, space is not allocated for data which varies according to the
> unlimited dimension. Suppose that the maximum record that has been written
> is M. (Initially M = 0). File system space is allocated (and pre filled with
> the Fill Value) for records M to N (N> M) when record N is written the first
> time.
> 
> Note that none of this behavior is specified by the interface.
> A "smarter" implementation could use some sort of linked list storage
> on disk (VSETs?) which only contains data that has actually been written.
> When a request for read came, it would have to do more complicated seeks
> and table manipulation, but this is definitly doable.

I'm curious, how useful do people feel this would be?  Right now, the
netCDF/HDF prototype behaves similarly to netCDF.  When record N of
a given variable is accessed, space for all of the records M..N is
set up [following Glenn's notation].   The netCDF/HDF project
makes a slight departure from the current netCDF as we extend only that 
variable, not all of the record variables.

In addition, we have decoupled the on-disk specification of data and
meta-data.  As a result, changing the meta-data will never require
the copying of existing data.  For the time being, it will still
require rewritting all of the meta-data.  Eventually, we hope to
optimize this so that only the meta-data which has changed gets
rewritten; it will reduce the amount of Unidata code we can reuse
as-is, so we decided to not adopt this strategy for our prototype.

> > because of the potential of wasted space?  Clearly, a netCDF copy operation
> > would take care of that, if required.  For large data sets, this could be an
> > expensive operation.
>
> You can enter (re)define mode and delete a whole variable. This generally
> involves a copy.

Is this new?  I was not aware that a variable could be deleted.

> > Generalization of CDF/netCDF
> > arrays to non-rectilinear meshes can be accomplished by conventions for
> > attribute and variable specifications.  I did this myself for the original 
> > CDF
> > implementation eons ago and extended it to include simple irregular and 
> > sparse
> > meshes.  However, the underlying semantics of the netCDF/CDF data model
> > severely limit how far this can go.  Our approach has been to define a more
> > comprehensive data model than is used in netCDF/CDF.  To date, the results
> > have show promise.

We agree that using attributes for this information is of limited
use.  We have been looking into SILO (don't know what it stands for)
which adds the idea of 'objects' and 'directories' to the netCDF model.

Directories can be used to provide generalized hierarchical structure,
for example, each directory could represent a distinct netCDF with
its own dimensions, variables, etc...

Objects allow the grouping of related data into a single
compounde structure.  A directory has a type, such as 'regular
mesh' and the components of the object would be variables which
define the gridpoints and the data values.

The number and names of the components are completely user specifiable,
so objects could be easily used to model other things besides meshes.

Would it be possible to get a description of the data model you
have developed at IBM?

-Chris