standardizing data storage

  • Subject: standardizing data storage
  • From: shectman@xxxxxxxxxxxxx (Robert M. Shectman)
  • Date: Fri, 5 Nov 93 14:50:21 -0800
The group I work for is looking at standardizing on a data storage
standard, and the top two contenders appear to be NetCDF and HDF.
In looking at the documentation, I have a question on NetCDF
(and please don't take this as a reason to start a debate
of NetCDF vs HDF).

Given two 'record variables', ie two multidimensional arrays, with
one dimension being 'unlimited',  lets call them temp, and rh, and
the unlimited dimension we'll call time, and a program 
that at each time step tries to write temp and rh to a NetCDF file.

The question becomes, if the program alternatively writes temp and
rh, ie:

      at time = 0
            write temp
            write rh
      at time = 5
            write temp
            write rh
      at time = 7
            write temp
            write rh

      and so on and so forth for an unknown at program start number
      of time steps.

what happens in the NetCDF file.  There are two basic models of
storage.  One is that the temp an rh arrays are each contiquous.
Therefore at each write, the entire NetCDF file is rewritten, to
add more space to each array.  The other is that the unlimited
variable represents some pointer or record structure that allows
each write to simply be appended to the end of the file.
The user's guide gives no indication of which model is more
correct.  And, at this point, I am not really interested in the
internals of the record/pointer model if it is the correct model.
I understand that the documentation is written without any
references to the internal structure, so that if it does change,
the documentation doesnt have to.  The only mention that comes
close to this, is that if a size of a variable is changed
the file may have to be rewritten to accomodate the increase in size.

The question reflects on the efficiency of NetCDF given the above
programming model.  If each write, forces NetCDF to rewrite the
data file so that each record variable remains contiquous, then
the above model is going to be very inefficent.

Please respond to my email address directly, as I am not currently
on the NetCDF distribution list.

Thank you

Robert M. Shectman
Atmospheric Release Advisory Capability
Lawrence Livermore National Laboratory