[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20040309:question regarding removing record from dataset



>To: "'Unidata Support'" <address@hidden>
>From: "Dai, Feng Wei" <address@hidden>
>Subject: RE: 20040309: question regarding removing record from dataset 
>Organization: UCAR/Unidata
>Keywords: 200403091336.i29DaTrV017557

Hi Fengwei,

>   I m reading "NetCDF User's Guide for C", trying to find out if its
> technology will be useful to support one of our protential project. we are
> going to deal with huge time series data in the project, relational database
> is not able to handle time series data very well according to our
> experience, we are looking for a ideal way to do so. NetCDF technology seems
> to meet requirements after reading the document. However, i didn't find
> anything related to records deletion, it is also important in our project
> since housekeeping is necessary after accumulating time series data for one
> or two years (although NetCDF can support large file which can probably
> store data for 30 or 40 years according time series data volume in the
> bank).

Although there is a netCDF function to delete attributes, there is no
netCDF function to delete variables or variable data.  This omission
was intentional, to keep the format and interface simple and because
deletion of data is not common in the context of scientific data
management for which netCDF was designed.  Currently, the only way to
clean up a netCDF dataset by deleting unneeded data is to create a new
netCDF dataset and copy just the desired data to the new dataset.

We originally considered supporting deletion of variables and data
records as well as attributes, but the necessity of supporting garbage
collection (recovering the space of the deleted data) added sufficient
complexity to outweigh the perceived benefits.  If a variable uses the
record dimension, compacting its record data is not significantly
faster with the netCDF format than recopying all the record data to
another file.  You could save time by deleting data from several
variables and compacting once, but the time taken would still be on
the same order as copying the file.

NetCDF was not intended to be a database management system, but rather
a data model for array-oriented scientific data.  If deleting
variables and data records is a common operation in your application,
the current version of netCDF may not be a good choice.

We're currently implementing a new version of the netCDF interface on
top of the HDF5 format, which supports garbage collection.  That may
make it practical to add interfaces to delete data records.  We'll
consider whether adding this capability would be desirable, but it's
not in our current list of requirements.

Thanks for the interesting question!

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden          http://www.unidata.ucar.edu/staff/russ