Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

NetCDF Plans from 2007 through 2012

Please send comments to ed@unidata.ucar.edu

Ed Hartnett, 8/16/2007

Introduction

This document details the major anticipated developments in the netCDF (and libCF) C, Fortran, and C++ libraries (and associated utilities) for five years, that is, until 2012.

These are proposed features and will change without notice. User feedback is invited.

Purpose of this Document

This document serves the following purposes:

Sources of New NetCDF Features

One important source of new feature ideas for netCDF is the netCDF Java implementation, and the set of related tools which have evolved in the world of Java.

Over the next five years netCDF/libCF libraries will adopt some of the capabilities currently found only in the Java library.

Another important source of new ideas is the user community.

NetCDF development has always included strong input from the user community. (Examples include the current F90 API, and the 64-bit offset format, both resulting from contributions by netCDF users and now part of the netCDF package).

Over the next five years, user contributions will play an even more important role. Some of the proposed features are based on well-developed user improvements to netCDF (for example, the work of the OpenDAP group).

The Climate and Forecast conventions community provides a metadata standard with which Unidata can build a new tool, libCF, to help deal with geo-referenced data.

NetCDF has always been a general-purpose package, used primarily in the Earth science community, but containing surprisingly few Earth science specific features. When netCDF was introduced, the idea of complete interoperability between different modeling groups was a dream. With netCDF, it is becoming a reality. But netCDF alone cannot provide all the functionality needed for this complex task.

The NetCDF-4.0/HDF5 format contains many features which might fit into the classic data model and file format with a little effort. In some cases (ex: zlib compression/decompression on the fly), this has already been accomplished. In other cases (ex: groups) an obvious, backward compatible solution presents itself.

These sources provide an endless stream of suggested (and, in some cases, already implemented) improvements for netCDF. The challenge is to select from the list those features that will be most helpful to the user community.

Vision for NetCDF Development

To continue to support current users, and attract new scientific communities, netCDF must adapt to evolving computing environments.

Solution Architecture

The development of these capabilities will take place in three tracks:

  1. NetCDF-3 development will extend the capabilities of the classic and 64-bit offset format (and perhaps new, related formats), with minimal changes to the netCDF API.
  2. NetCDF-4 development will provide new features, based on HDF5, to meet requirements that cannot be met by the netCDF-3 classic formats.
  3. LibCF will meet the needs of the atmospheric, ocean, and climate science communities for producing and working with complex geo-referenced data sets in a standard way.

Commitment to Classic NetCDF

As netCDF is further developed we will continue to improve classic netCDF, and to bring new features to users of the classic format.

With the netCDF-4.0 release, new features are appearing for NetCDF-4/HDF5 format data files. But this does not mean that development has stopped on the classic format and library.

In the 4.1 release and beyond, new features will sometimes be developed for the classic format before being added for the expanded netCDF-4/HDF5 format and model. (As with OpenDAP work).

In other cases, features will be copied from the NetCDF-4/HDF5 format to the classic format. (As with data compression, or groups, if we add these features to the classic library).

Commitment to Software Compatibility

NetCDF is very well tested to ensure that new releases do not change the behavior of existing user software.

We add to the APIs, but we don't remove or change them (once they have been through a full release - we may change new functions in the alpha or beta release phases).

Similarly, the classic format may spawn another binary format for 64-bit machines. But this new format, when and if it is developed, will be added in a way which preserves existing functionality.

As long as you are using documented netCDF features, if a new release of netCDF changes the behavior of any code, that's a bug. Please report it to support-netcdf@unidata.ucar.edu.

Commitment to Software Correctness

It is sobering to contemplate that the netCDF C library is one piece of software used by almost every climate researcher working on the IPCC data, and thus represents a single point of failure for the entire climate science enterprise of the human race. (Only the completely independent Java implementation provides a redundant software path to check data accuracy.)

Each release of netCDF is extensively tested, and a very large number of tests are run with "make check" to ensure that netCDF works. Tests of machines of different architectures ensure the inter-operability of netCDF data. Additional tests are run to check handling of very large files, parallel I/O, and more.

NetCDF must continue to expand its testing. We can never test everything, but we can certainly test a little bit more every release.

Sep, 2007 Status of the NetCDF/LibCF C-based Libraries

NetCDF 3.6.2

NetCDF 3.6.2 was released in March 2007. It included a new build system, ready for netCDF-4, and new example programs and documentation.

It was anticipated that a 3.6.3 release in the final quarter of 2007 would be the final 3.6 series release. However, the differences between the 3.6.x series and the 4.0 series have been isolated to one file in the build. Therefore it is easy to keep the 3.6.x branch alive. Eventually, it is anticipated that the 3.6.x branch will be ended. The netCDF 4.0 distribution, when built with default options, results in the 3.6.x build, so a separate distribution is not really necessary, but provides some comfort level and ease to long-established users who don't want to get involved in HDF5 yet.

NetCDF 4.0-beta

Version 4.0 of netCDF has been out in beta version since April, 2007. When HDF5 1.8.0 becomes a stable release, netCDF 4.0 will be released. Much work remains to get ncdump and ncgen utilities working with netCDF-4, and there is also some documentation and example work to be done, as well as a few HDF5 problems to be tracked down.

The beta release has gone well, with users reporting complex tests which have succeeded with netCDF-4 (NCO, pyNetCDF), performance improvements (NASA GDMO), and good parallel I/O performance (NCSA).

LibCF 1.0-alpha

The libCF library is currently in alpha release. It contains a some very basic functions relating to the CF Conventions. LibCF has only begun to scratch the surface of ways to use netCDF attributes to store geo-located data sets.

Proposed Features

The following features are proposed, not guaranteed.

Insert additional strong disclaimer here. This document exists for internal Unidata planning purposes only, and is being made available to the user community for comment, but readers are advised to proceed with caution when using this information. Check with support@netcdf-support.unidata.ucar.edu for the latest information.

Features are listed in three sections below, for netCDF-3, netCDF-4, and libCF development. Within each section the features are listed in decreasing order of priority.

NetCDF-3 Features

All the features listed in this section apply to the netCDF-3.x series. This is the same as the netCDF-4.x code, but with the netCDF-4 code permanently turned off. Since the netCDF-4.x releases use the same code, these features will also appear in the 4.x series. (But they may only function with classic or 64-bit offset files.)

An 3.7 release is planned for the second quarter of 2008 with the remote access with libdap/libnc-dap. See the netCDF-3.7 requirements.

Remote Access with libdap++/libnc-dap

OpenDAP provides a modified version of the netCDF library that allows remote access to netCDF files (for netCDF-3 classic and 64-bit offset files only).

As part of a project to provide a C-based client in the netCDF library, this feature uses C++ libraries to provide the remote access. These are the C++ libraries currently used by OpenDAP.

See the SDCI proposal for more information about the remote access project.

Streaming NetCDF Data

In netCDF classic and 64-bit offset format, the file header contains the number of records in the file. But this is not strictly necessary. If the software would ignore this number, or allow it to be set to some special value (like -1) it would be easier to handle the creation of new data files without knowing how many records they will contain when the header is written.

This would have no effect on netCDF-4 files.

nc_cpy_varx Interfaces for NetCDF Performance

Copying without byte swapping to achieve better performance in netCDF applications (like NCO).

(Read-only) NcML Support

The use of NcML on the Java side involves two large new features:

  1. Support for an XML format of data exchange.
  2. Ability to access data through virtual files, which contain references to external data files, perhaps in different locations, plus additional metadata added in the NcML layer.

NcML will be implemented in ncdump, and also the C library. When NcML is fully supported, the user will be able to open an NcML file with nc_open (or nf_open, etc.), and get data from the virtual file that the NcML layer is constructing. This implies an opportunity for code sharing of the NcML code by ncdump and the C library.

Being able to read an NcML file as if it were a "real" file would have tremendous advantages for the user. All existing C/Fortran/C++ software would automatically have capability to use the NcML layer.

Should the C library be able to write NcML files? It is not clear that this is as useful. As XML based files, NcML files would best be created by the some XML tool, such as ncdump, rather than a C or Fortran program.

Some preliminary requirements for this have been drawn up in the netCDF 4.1 requirements.

Fortran Layer Re-factor

The Fortran 2003 standard provides for the calling of C from Fortran in a standard way. This presents the opportunity to eliminate the cFortran.h layer.

The cFortran.h layer has served netCDF well for many years, and will for several more releases, but is also a rich source of porting problems and confusion. The development of a standard way of calling C from Fortran is to be preferred.

This new Fortran layer will also be extended to apply to the netCDF-4 functions.

Larger Variables on 64-bit Machines

Antoher group of developers anticipates working on a new binary format (CDF3) that would allow for very large variables on 64-bit machines.

More Examples, some with CF Conventions

The examples directory, first included with version 3.6.2, has already proven useful as documentation, as a starting point for many users in writing netCDF code, and as a set of tests.

The 3.6.2 examples are all netCDF-3 examples (of course) and are all very simple. One or more additional examples, with more complex data sets, would be welcome. We also need an example of a generic utility that works on any netCDF data, for example a netCDF copy program that uses only the netCDF interface and provides a framework for plugging in user code for data transformations.

The examples currently don't conform to the CF conventions, thus can't easily be displayed in the IDV. This will be corrected.

Although there are the beginnings of some simple examples for netCDF-4, many more can be provided. NetCDF-4 could use it's own set of complete and comprehensive examples, but we await the development of some real, complex, netCDF-4 files in the wild.

Multi-Threaded NetCDF

NetCDF is not thread-safe, but could be.

Groups

NetCDF-4 groups could be added to netCDF-3 by allowing the "/" character in names, and adopting the convention that it indicates groups. In this way, the names of a variable, dimension, or global attribute could define a group structure for the file. Upon opening the file, the netCDF library would construct the group structure for the file.

New Numeric Types

Another netCDF-4 feature that could be added to netCDF-3 is the new numeric atomic types. (NC_UBYTE, NC_USHORT, NC_UINT, NC_INT64, NC_UINT64).

Compression

Compression of netCDF-3 data has been achieved by user Bill Noon at Cornell with the znetcdf library. This is available for version 3.3.1 of netCDF. This could be merged into the current netCDF autoconf/automake files.

Better Integration with pNetCDF

Currently the pNetCDF (parallel netCDF) package uses a copy of the netCDF code to build. It would be more useful if it used the built netCDF library, then it would work with the latest netCDF distribution without any work. This could pretty easily be accomplished with some cooperation between Unidata and the pNetCDF people, and some modifications in their Makefiles (or ours).

Make NetCDF Classic/64-bit Offset Format a Formal Standard

Submit formal spec of format to a standards process, such as NASA's ESDSWG.

NetCDF-4 Features

The features in this section will only be available in the NetCDF-4.x series.

Remote Access with the OCAPI Library

OpenDAP provides a modified version of the netCDF library that allows remote access to netCDF files (for netCDF-3 classic and 64-bit offset files only).

This feature will be added to the netCDF release, so that a netCDF user can read data remotely from an OpenDAP server.

This feature will be extended to cover netCDF-4 features; this will be a significant task.

See the SDCI proposal for more information about this feature.

New C++ API (CXX4)

A new C++ API has been partially developed. The API is based on the current C++ API, but adds C++ features such as exceptions and a name space, and handles the new netCDF-4 features like groups, new and user-defined types, and the rest of the new netCDF-4 file features.

This code needs to be further developed, and many tests added.

Some preliminary requirements for this have been drawn up in the netCDF 4.1 requirements.

Fortran 90 Structures and NetCDF Compound Types

Fortran 90 provides a way of defining C-like structures, but the resulting storage varies from compiler to compiler (just as with C).

While the HDF5 team solved this problem for C, they have not solved it for Fortran 90. This would probably be a pain to support and maintain for the many different Fortran compilers in use, but would be very valuable to users.

Greater Interoperability with HDF5

Currently netCDF-4 can only read HDF5 files if they meet certain restrictions. (Files created with netCDF-4 necessarily meet these restrictions).

The most difficult restriction is the use of the new HDF5 dimension scales for every dimension in the file. This is possible in HDF5 but not the usual way things are done. If netCDF-4 would handle the case of dimensions that don't have an associated dimension scale (which we sometimes call "anonymous dimensions"), then the vast majority of HDF5 data would become readable with the netCDF-4 API. The remaining restrictions would still disqualify some data sets, but most would meet these restrictions.

Benchmarks for NetCDF-4

We need some realistic benchmarking for netCDF-4.

We need to compare the output of a realistic file with netCDF-3 and netCDF-4 classic mode.

We need to compare the performance of netCDF-4 with parallel I/O and pNetCDF.

Reading GRIB

NetCDF-4 can be modified to transparently read GRIB files.

LibCF Features

The features in this section will be added to libCF, to provide some geo-referencing capabilities to netCDF users.

Scientific Data Object Support

The support of scientific data objects in libCF will include several standard ways of storing common scientific data types.

GRIDSPEC - A Grid Standard for Earth System Models

V. Balaji, from GFDL has proposed the GRIDSPEC standard as a part of the CF Conventions. LibCF will be one of the software packages assisting in this effort, by implementing a user interface for netCDF/libCF users. See the GRIDSPEC document for more information.

cdtime

Part of the Climate Data Management System (CDMS) Version 4.0 is the cdtime Python package.

This package contains definitions of various time axes and calendar information commonly used by climate scientists. This functionality needs to be added to libCF.

Cell Methods

Support for Cell Methods will be provided by LibCF. It's not exactly clear what this means.

Testing and Web-site Improvements

The testing system for netCDF and libCF is an important component of the development process. Daily testing and snapshot release are automatically handled by the test system. Improvements in the test system directly impact the development process.

Dedicated Main Testing Machine

It would be useful to have shecky be dedicated to testing, and replace it with a new development machine. This would make the test platform more stable. As it is, developer activities have broken various build environments on shecky. This breaks the tests on those build environments.

We need the freedom to muck around with my development platform, but the stability of a dedicated machine for netCDF testing.

Additional Compilers and Combinations

It would be nice to add more Fortran compilers to the test system. Candidates include NAG and Lahey.

Improvement of Testing Software and Output

There are improvements to be made in the testing system and it's output. Currently the test programs are a bunch of ad-hoc bash scripts, one set of scripts for each of the three test situations (netCDF-3, netCDF-4, and libCF.)

NetCDF-4 Tests for C, F90, C++

Testing for netCDF-4 features is barely adequate in the C layer, and totally absent from the Fortran 77 and Fortran 90 layers. The new C++ API for netCDF-4 does include some testing, but much more could be done.

Improvements to the Web Site

The netCDF web site is the primary tool for communicating with users. Although it looks good, there are improvements that could be made if there were some more time available to fuss with the web site.


This page is maintained by Ed Hartnett.
Questions or comments can be sent to <support@unidata.ucar.edu>.
 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Office of Programs University Corporation for Atmospheric Research (UCAR)   Unidata is a member of the UCAR Office of Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690