overcoming netcdf3 limits

Greg Sjaardema gdsjaar at sandia.gov
Tue Apr 24 15:54:43 MDT 2007


Ed Hartnett wrote:
> robl at mcs.anl.gov (Robert Latham) writes:
>
>   
>> Hi
>>
>> Over in Parallel-NetCDF land we're running into users who find even
>> the CDF-2 file format limitations, well, limiting. 
>>
>> http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/NetCDF-64-bit-Offset-Format-Limitations.html
>>
>> http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#Large%20File%20Support10
>>
>> If we worked up a CDF-3 file format for parallel-netcdf (off the top
>> of my head, maybe a 64 bit integer instead of an unsigned 32 bit
>> integer could be used to describe variables), would the serial netcdf
>> folks be interested, or are you looking to the new netcdf-4 format to
>> take care of these limits?
>>
>> Thanks
>> ==rob
>>
>>     
>
> Howdy Rob!
>
> Your email has generated a lot of discussion here, and we are
> formulating our response.
>
> However, another question: have you considered using netCDF-4? It does
> not have the limits of the 64-bit offset format, and support parallel
> I/O, as well as a number of other features (groups, compound data
> types) which might be helpful in organizing really large data sets.
>
> Since it uses the netcdf-3 API (with some extensions) it should be
> possible to easily convert code to use netCDF-4...
>
> Thanks,
>
> Ed
>   
I have been following the netcdf-4 development very closely.  It has
some good points, especially the elimination of the dataset limits. 
I've generated a 300-million element mesh with the latest release that
wouldn't be possible with the netcdf-3 format. 

However, there is concern about the robustness of the underlying HDF5
format.  It is possible to corrupt the entire file if there is a crash
at the wrong time.  We cannot build our production system on a library
that has this behavior.  Some of the systems we run on are not known for
their stability and if a job that has been running for a few days
crashes and loses all data, that is not acceptable.  With the netcdf-3
library, we would lose all or a portion of the last "time dump" written,
but not previous data that had been synced to disk.  I was also a little
concerned with the long time that it took for hdf5-1.8.0 to make it to
the beta phase... 

We are definitely looking at the netcdf-4 effort, but are also looking
at other solutions...
--Greg


More information about the netcdfgroup mailing list