[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20020617: NetCDF functionality



>From: "Stephen.G. Loch" <address@hidden>
>Subject: NetCDF functionality
>Organization: Natural Environment Research Council
>Keywords: 200206171721.g5HHLRJ00347 netCDF

Hi Steve,

> I don't understand why it is so hard to delete variables or resize the
> record dimension (lose datacycles). In the latter case a single number
> has to be changed in the NetCDF header so coding the corresponding API
> routine is trivial.
> 
> In a databanking environment these are constant and continuing
> functional requirements and the process of copying elements of one
> NetCDF file to another - to which one has to resort - seems extremely
> inefficient.  You could 'remove' variables without recopying the
> datacycles - until such time as a compaction call were issued.

Taking the second question first, resizing the record dimension is not
hard, in the sense that it grows as needed to accommodate new data, as
the data is added to the file.  By design there is no function in the
netCDF interface to change the record dimension; thus it always
correctly reflects the data in the netCDF file.  If you could change
the record dimension independently from writing the data, it would be
possible to create an undesirable inconsistency between the record
dimension and the actual number of records in the file.

I'm not sure what you mean be "lose datacycles" in the case of
resizing the record dimension.  I'll assume you want to decrease the
record dimension, deleting data.  Perhaps you also want to delete
records from a dataset that are not just at the end of the data.
(Maybe I've misinterpreted what is meant by "datacycles".  Google finds
only two documents that contain the words "datacycles" and
"databanking", both from BODC.)  

We originally considered supporting deletion of variables and records
as well as attributes, but the necessity of supporting garbage
collection added complexity that did not seem worth the perceived
benefits.  For example, when a variable is deleted, should the
dimensions on which it depends also be deleted, if not used in other
variables?  When all the records for a variable are deleted, should
the variable be deleted?  And if a variable uses the record dimension,
compacting its data (which appears in every record) is not
significantly faster than recopying all the record data to another
file.  It's true you could save time by deleting several variables and
compacting once, but the time taken would still be on the same order
as copying the file.

NetCDF is not intended to be a database management system, but rather
a data model for array-oriented scientific data, and scientists do not
commonly delete data from existing datasets.  Even in database
systems, operations that change the schema of a database, such as
deleting fields from existing relations, are not typically handled
efficiently.  Deleting variables from a netCDF file is analogous to
deleting a field from a database relation, in that it is a change to
the schema of the data.

If deleting variables and records is a common operation in your
application, perhaps netCDF is not well-suited to that application.  A
format and API such as HDF or a database management system might be
preferable.

> On a different topic I don't understand why there is no reference to
> Matlab support when languages are being discussed (e.g. on
> http://www.unidata.ucar.edu/packages/netcdf/). The obvious reference is
> http://woodshole.er.usgs.gov/staffpages/cdenham/public_html/MexCDF/nc4ml5

We considered MATLAB, IDL, and similar packages to be applications
rather than languages, although we realize they have associated
languages.  You may be right, that some users think of MATLAB
primarily as a higher-level programming language rather than a package
for analysis and visualization which also happens to have an
associated language.  However, we did reference the page you mention
at

  http://www.unidata.ucar.edu/packages/netcdf/software.html#NC4ML5

and we reference several other ways to access netCDF data from MATLAB at

  http://www.unidata.ucar.edu/packages/netcdf/software.html#MATLAB

> Steve Loch
> BODC Systems Coordinator

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu