NetCDF/HDF5 Project After One Year

Last year, a Unidata proposal to extend netCDF and HDF5 to provide an HDF5 storage layer for the popular netCDF data model and library was awarded two-year's funding under NASA's Advanced Information Systems Technology (AIST) Program. We're now about half way through the project, and Unidata's Ed Hartnett reported on our progress at the NASA 2004 Earth Science Technology Conference held June 22 - 24 in Palo Alto, California. Russ Rew also presented an overview of the current project status and plans for a review panel after the NASA meeting.

NetCDF-3 and HDF5 are defacto standard data models, libraries, and formats for scientific data access. NetCDF was developed at the Unidata Program Center and first released in 1987. HDF5 was developed at the National Center for Supercomputing Applications and released in 1998. Both software packages are freely available for download.

The advantages of netCDF-3 are its popularity, simplicity, support in a wide variety of tools, and multiple implementations. Its primary uses are in climate, forecast, and ocean models, in data archives, and for some observational data.

The advantages of HDF5 are its power, efficiency in high-performance computing and storage, and extensibility. Its primary uses have been for satellite data, in computational fluid dynamics, and in parallel computing.

The project aims to create netCDF-4, providing a simple netCDF interface to HDF5 storage. This combination would ideally have desirable characteristics of netCDF-3 and HDF5, while taking advantage of their separate strengths: the widespread use and simplicity of netCDF-3 and the generality and performance of HDF5. It would make netCDF more suitable for high-performance computing, while providing a simpler high-level interface for HDF5. One concrete goal of the project is to demonstrate benefits of the combination in advanced Earth science modeling efforts. A desirable side effect would be continued collaboration between Unidata and NCSA in design, development, testing, and support.

During the first year of the project, we have implemented a netCDF-4 prototype that provides the current netCDF-3 interface over HDF5, to demonstrate backward compatibility for both programs and data. We have also determined needed HDF5 enhancements, prepared the netCDF-3 source distribution for incorporation of netCDF-4, and begun to design netCDF-4 programming interfaces to add enhancements made possible with HDF5. At the same time, NCSA has implemented some of the needed HDF5 enhancements we identified. In the next year, we intend to finish implementing the new netCDF-4 interfaces over the enhanced HDF5.

The netCDF-4 prototype that has resulted from Ed Hartnett's work during the first year comprises about 13,000 lines of C code, passes all netCDF-3 tests, and shows that the read/write times and file sizes are satisfactory when HDF5 is used as a new storage layer. It also validates an ambitious goal for backward compatibility: no program changes will be required by current netCDF programs, and the interface will be able to access all current netCDF files as well as new HDF5 files written through the netCDF-4 interface transparently. The first year's work has included implementing automated multi-platform testing, converting netCDF documentation to a more maintainable form, refactoring the documentation to create a new language-independent Users Guide and 4 language-dependent guides for C, C++, Fortran-77, and Fortran-90, adding new large file support by incorporating changes developed at Sandia Laboratories, and providing improved Windows and .NET support.

The project is on schedule for a July 2005 production release, with several intermediate releases before then. There are still some open design issues having to do with how much of the power and associated complexity of the HDF5 library can be made available through a simple netCDF-4 interface. Considerable work remains to be done in testing the result in models, to determine whether the provision of parallel I/O will ease some of the I/O bottlenecks currently found in models run on supercomputing platforms. Two papers, three posters, and two conference presentations have resulted in the project's first year. An independent review of the project has concluded that it is making very good progress towards its overall goals.


Last modified: Thu Jul 1 16:48:26 MDT 2004