[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 970430: forwarded question about netcdf header files



>To: address@hidden
>From: address@hidden (Mary Haley)
>Subject: forwarded question about netcdf header files
>Organization: .
>Keywords: 199704301609.KAA10607

Hi Mary,

> A user from Sandia Labs had a question about netCDF that I can't
> answer.  Can you help me out?

Sure.  My response is below; feel free to use it.

--Russ

>  I've got a question about 'netcdf'.  Do you know anything about
>  this product?  It appears from the documentation that any header
>  modifications cause entire files to get copied.  It's deadly slow.
>  Here's an example from the web page:
>  
>  This header has no usable extra space; it is only as large as it
>  needs to be for the dimensions, variables, and attributes in each
>  netCDF file.  This has the advantage that netCDF files are compact,
>  requiring very little overhead to store the ancillary data that makes
>  the files self-describing. A potential disadvantage of this
>  organization is that any operation on a netCDF file that requires
>  expanding the header, for example adding new dimensions and new
>  variables to an existing netCDF file, will be as expensive as copying
>  the file. This expense is incurred when ncendef() is called, after a
>  call to ncredef(). If you create all necessary dimensions, variables,
>  and attributes before writing variable data, and avoid later
>  additions and renamings of netCDF components that require more space
>  in the header part of the file, you avoid the cost associated with
>  expanding the header.
>  
>  *** end example
> 
>  It doesn't make any sense to us that someone hasn't split the header
>  and data into 2 separate files by now.  Or come up with some more
>  intelligent way to do this.  Have you heard anything about this?

As explained in the User's Guide, netCDF is not a database system, and
is not intended (or appropriate) for applications that require frequent
schema changes, where by "schema" I mean the logical structure of a
dataset captured in the dataset header.  The use of the "unlimited
dimension" to permit efficient growth along one dimension is one sort of
schema change that is supported by the netCDF interface.

A typical use of netCDF is to define a schema for a data archive,
write many files that use that same schema but with different data
values, dimension sizes, and attribute values, and export the schema so
that many different applications can read the data.  If the schema later
requires the addition of new variables, dimensions, and attributes for
new datasets, this can be done with no effect on existing applications,
since they can read both old and new datasets through the netCDF
interface without modifications and can be modified to read the new
variables at some later time, if it's convenient.

Another typical use is saving model output in a form that can be used by
visualization applications or as input to other models.  This usually
does not require schema changes, since the structure of the data is
known before it is written.

For these sorts of purposes, netCDF I/O performance is usually quite
adequate for applications where portable, self-describing data is
important.

The original NASA CDF format was a multiple file format that used a file
for header information and a separate file for each variable.  This has
the advantage that schema changes can be quite efficient, but enough
disadvantages (need to aggregate files with tar or zip, naming
conventions required for multiple files in a directory, difficulty with
copying or manipulating multi-file data, ...) that NASA ultimately
decided to provide a single-file format for CDF as well.

If you know you will be making schema changes to a single huge netCDF
file, it is possible to anticipate some sorts of changes so that copying
may not be necessary.  For example, you can initially define some
"extra" attributes, variables, and dimensions that can be later renamed,
(but this is admittedly fairly hokey).

We will be making a backward-compatible format change for netCDF version
4, and are considering a way to support more efficient schema changes with
the new file format, but are not seriously considering using multiple
files.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu