Re: overcoming netcdf3 limits

On Apr 24, 2007, at 10:25 PM, ncdigest wrote:

Date: Tue, 24 Apr 2007 15:54:43 -0600
From: "Greg Sjaardema" <gdsjaar@xxxxxxxxxx>
Subject: Re: overcoming netcdf3 limits

This is a multi-part message in MIME format.
- --------------080905080309050908000406
Content-Type: text/plain;
Content-Transfer-Encoding: 7bit

Ed Hartnett wrote:
robl@xxxxxxxxxxx (Robert Latham) writes:


Over in Parallel-NetCDF land we're running into users who find even
the CDF-2 file format limitations, well, limiting. bit-Offset-Format-Limitations.html 20File%20Support10

If we worked up a CDF-3 file format for parallel-netcdf (off the top
of my head, maybe a 64 bit integer instead of an unsigned 32 bit
integer could be used to describe variables), would the serial netcdf folks be interested, or are you looking to the new netcdf-4 format to
take care of these limits?


Howdy Rob!

Your email has generated a lot of discussion here, and we are
formulating our response.

However, another question: have you considered using netCDF-4? It does
not have the limits of the 64-bit offset format, and support parallel
I/O, as well as a number of other features (groups, compound data
types) which might be helpful in organizing really large data sets.

Since it uses the netcdf-3 API (with some extensions) it should be
possible to easily convert code to use netCDF-4...



I have been following the netcdf-4 development very closely.  It has
some good points, especially the elimination of the dataset limits.
I've generated a 300-million element mesh with the latest release that
wouldn't be possible with the netcdf-3 format.

However, there is concern about the robustness of the underlying HDF5
format.  It is possible to corrupt the entire file if there is a crash
at the wrong time.  We cannot build our production system on a library
that has this behavior. Some of the systems we run on are not known for
their stability and if a job that has been running for a few days
crashes and loses all data, that is not acceptable.  With the netcdf-3
library, we would lose all or a portion of the last "time dump" written, but not previous data that had been synced to disk. I was also a little
concerned with the long time that it took for hdf5-1.8.0 to make it to
the beta phase...

We are definitely looking at the netcdf-4 effort, but are also looking
at other solutions...
- --Greg

Hi all,
Speaking from the HDF5 side of things - we are working on implementing a solution to the problem of HDF5 files getting corrupted when a process crashes. Essentially, we are adding journaling to HDF5 metadata operations (similar to how file systems operate). This will allow files that have been potentially corrupted by an application crash to "replay" the journal and recover all the changes to the metadata in a file, up to the point of the crash. At the same time, we are also making metadata I/O operations asynchronous (starting with serial I/O, then working on parallel I/ O), which should speed up I/O quite a bit.

        Quincey Koziol
        The HDF Group

Attachment: smime.p7s
Description: S/MIME cryptographic signature