[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20020617: NetCDF functionality



>From: "Stephen.G. Loch" <address@hidden>
>Subject: NetCDF functionality
>Organization: BODC
>Keywords: 200206171721.g5HHLRJ00347 truncating records

Steve Loch wrote: 

> Sorry I never replied to your helpful comments. I would like to point out
> though that reducing the record dimension requires no garbage collection (for
> the system to harvest the unused disk space is something else) so one could
> introduce a routine that allowed you to change the record dimension in a
> downward direction towards one (bottom limit). You don't need to be able to do
> this in an upward direction as the mechanism for growing the file is already
> there. I appreciate that the seemingly ad hoc nature might run contrary to
> some design criteria but a), as we agree, it is easy to implement, b) it would
> be useful to us and no doubt a few others, and c) it does not compromise
> existing functionality.

You're right, it would be easy to implement, although truncating an
existing file with the POSIX ftruncate() function can still be a minor
portability problem (for example I don't think it works on FAT32 file
systems under Linux).  We also have to consider the scenario of one
process writing to a netCDF file while other processes read from the
file concurrently, which is supported by the nc_sync() call and
NC_SHARE flag, but I don't think there are any new problems there.

Yours is the first request we've had for this feature, but I'll put it
on the list of netCDF enhancements, and if it's as easy as it appears,
we may get it into the next version.  Thanks for the suggestion.

> On the other matter, leaving aside the occasional MEX file, we do use Matlab
> as complete replacement for Fortran, including database access,  and so we do
> regard it as a language.

OK, I'll add a reference to MATLAB on the netCDF home page.  Thanks
for the clarification.  But I expect to soon be hearing from users of
IDL, GrADS, R, and Tcl/Tk, asking why their language is not also
mentioned on the home page :-).

--Russ

> Regards. Steve Loch
> BODC
> 
> Russ Rew wrote:
> 
> > >From: "Stephen.G. Loch" <address@hidden>
> > >Subject: NetCDF functionality
> > >Organization: Natural Environment Research Council
> > >Keywords: 200206171721.g5HHLRJ00347 netCDF
> >
> > Hi Steve,
> >
> > > I don't understand why it is so hard to delete variables or resize the
> > > record dimension (lose datacycles). In the latter case a single number
> > > has to be changed in the NetCDF header so coding the corresponding API
> > > routine is trivial.
> > >
> > > In a databanking environment these are constant and continuing
> > > functional requirements and the process of copying elements of one
> > > NetCDF file to another - to which one has to resort - seems extremely
> > > inefficient.  You could 'remove' variables without recopying the
> > > datacycles - until such time as a compaction call were issued.
> >
> > Taking the second question first, resizing the record dimension is not
> > hard, in the sense that it grows as needed to accommodate new data, as
> > the data is added to the file.  By design there is no function in the
> > netCDF interface to change the record dimension; thus it always
> > correctly reflects the data in the netCDF file.  If you could change
> > the record dimension independently from writing the data, it would be
> > possible to create an undesirable inconsistency between the record
> > dimension and the actual number of records in the file.
> >
> > I'm not sure what you mean be "lose datacycles" in the case of
> > resizing the record dimension.  I'll assume you want to decrease the
> > record dimension, deleting data.  Perhaps you also want to delete
> > records from a dataset that are not just at the end of the data.
> > (Maybe I've misinterpreted what is meant by "datacycles".  Google finds
> > only two documents that contain the words "datacycles" and
> > "databanking", both from BODC.)
> >
> > We originally considered supporting deletion of variables and records
> > as well as attributes, but the necessity of supporting garbage
> > collection added complexity that did not seem worth the perceived
> > benefits.  For example, when a variable is deleted, should the
> > dimensions on which it depends also be deleted, if not used in other
> > variables?  When all the records for a variable are deleted, should
> > the variable be deleted?  And if a variable uses the record dimension,
> > compacting its data (which appears in every record) is not
> > significantly faster than recopying all the record data to another
> > file.  It's true you could save time by deleting several variables and
> > compacting once, but the time taken would still be on the same order
> > as copying the file.
> >
> > NetCDF is not intended to be a database management system, but rather
> > a data model for array-oriented scientific data, and scientists do not
> > commonly delete data from existing datasets.  Even in database
> > systems, operations that change the schema of a database, such as
> > deleting fields from existing relations, are not typically handled
> > efficiently.  Deleting variables from a netCDF file is analogous to
> > deleting a field from a database relation, in that it is a change to
> > the schema of the data.
> >
> > If deleting variables and records is a common operation in your
> > application, perhaps netCDF is not well-suited to that application.  A
> > format and API such as HDF or a database management system might be
> > preferable.
> >
> > > On a different topic I don't understand why there is no reference to
> > > Matlab support when languages are being discussed (e.g. on
> > > http://www.unidata.ucar.edu/packages/netcdf/). The obvious reference is
> > > http://woodshole.er.usgs.gov/staffpages/cdenham/public_html/MexCDF/nc4ml5
> >
> > We considered MATLAB, IDL, and similar packages to be applications
> > rather than languages, although we realize they have associated
> > languages.  You may be right, that some users think of MATLAB
> > primarily as a higher-level programming language rather than a package
> > for analysis and visualization which also happens to have an
> > associated language.  However, we did reference the page you mention
> > at
> >
> >   http://www.unidata.ucar.edu/packages/netcdf/software.html#NC4ML5
> >
> > and we reference several other ways to access netCDF data from MATLAB at
> >
> >   http://www.unidata.ucar.edu/packages/netcdf/software.html#MATLAB
> >
> > > Steve Loch
> > > BODC Systems Coordinator
> >
> > --Russ
> >
> > _____________________________________________________________________
> >
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                     http://www.unidata.ucar.edu