NetCDF NetCDF-4.0 Requirements

These requirements represent our understanding of the netCDF-4.0 library. They are subject to change without notice.

Comments are welcome, and should be sent to the netcdf-hdf mailing list: netcdf-hdf@unidata.ucar.edu
Backward Compatibility
  • The entire netCDF-3.x API is supported. It can read/write/modify netcdf-4/HDF5 data files.
  • The entire netCDF-2.x API is also fully supported.
  • The netCDF-3 Fortran 77, Fortran 90, and C++ APIs can read and write data in classic, 64-bit offset, or simple netCDF-4 formats. These APIs cannot yet access new netCDF-4 functionality.
Support for Large Files
  • Relaxation of 2 GB classic file format limits, on systems which support large files. Such systems are detected at install time, and the netCDF library is built with the _FILE_OFFSET_BITS=64 macro, on those systems, if required.
  • Users on systems that don't support LFS are warned at install time.
Use of HDF5 for Storage
  • NetCDF-4 uses HDF5 to store data.
  • NetCDF-4 provides a complete facade. No HDF5 artifacts are used in the public netCDF-4 API. No mixing of the APIs takes place.
Backward File Format Compatibility
  • NetCDF 4 uses HDF 5 as its storage layer. It produces valid HDF5 files, which can be read without using the netcdf interface. HDF5 files produced with netCDF-4 should be modified using the netCDF-4 interface.
  • NetCDF 4 can also create/read/modify files created with previous versions of netCDF, using the original netCDF data format.
  • If the user opens an old netCDF file, and attempts to modify it, NetCDF 4 will stick with the (old) netCDF file format. New API features (like adding groups) won't work on these files.
  • Files created in netcdf-4 can be restricted to a strict netcdf-3.x functionality at file creation time. Users of these files will not be allowed to use additional netcdf-4 features, like multiple unlimited dimensions, groups, new types, etc.
  • There is a way for users to copy an old netCDF file into the new HDF 5 data format.
Backward API Compatibility
  • Programs written for a 3.x version of netCDF can use the netCDF-4 library by relinking.
  • NetCDF 3.x error codes remain unchanged. Netcdf-4 adds some new error codes.
  • For netCDF-4 files, nc_redef and nc_enddef are automatically called as needed.
Parallel I/O
  • Parallel I/O reading and writing to netCDF file is supported, using the HDF5 parallel I/O features.
  • The installer specifies whether parallel netCDF is to be used at install time.
  • The parallel I/O features require that the MPI library be installed.
  • The netCDF-4 library (like the HDF5 library) documents its functions as collective and independent.
  • Collective HDF5 calls are not made during independent netCDF-4 operations.
  • Demonstrates performance gains (over netcdf-3) in modeling contexts on advanced architectures.
Multiple unlimited dimensions
  • Variables may use multiple unlimited dimensions.
  • Unlimited dimensions need not be shared. That is, different variables can have different unlimited dimensions.
  • The call nc_inq_unlimdim returns the first unlimited dimension. An additional function returns an array which contains the full list of unlimited dims.
  • Chunking is required in any dataset with one or more unlimited dimension in NetCDF-4.
Variable/Dataset Creation Options
  • Chunking is required in any dataset with one or more unlimited dimension in HDF5. NetCDF-4 supports setting chunk parameters at variable creation. The user can optionally select a chunking algorithm by setting chunkalg to NC_CHUNK_SEQ (to optimize for sequential access), NC_CHUNK_SUB (for chunk sizes set to favor equally subsetting in any dimension.
  • When the (netcdf-3) function nc_def_var is used, a sequential chunking algorithm will be used. (Just as if the var had been created with NC_CHUNK_SEQ).
  • The sequential chunking algorithm sets a chunksize of 1 for all unlimited dimensions, and all other chunksizes to the size of that dimension, unless the resulting chunksize is greater than 250 KB, in which case subsequent dimensions will be set to 1 until the chunksize is less than 250 KB (one quarter of the default chunk cache size).
  • The subsetting chunking algorithm sets the chunksize in each dimension to the nth root of (desired chunksize/product of n dimsizes).
Data Types
  • The following new atomic data types are supported: unsigned int8, unsigned int16, unsigned int32, signed and unsigned int64.
  • Attempting to use any of the new types with a netCDF-3 Classic or 64-bit Offset format file will return an error.
  • Enums are supported.
  • Users can define structs composed of atomic types, or previously defined types. We call these compound types.
  • Compound types have a name, which is written to the data file, and a typeid, which is assigned when the file is read, or the type is created. Unlike atomic types, the typeid for a compound type may change when the file is later reopened.
  • Users can read files with unknown compound types, and use netCDF-4 functions to learn about the unknown compound types.
  • Users can retrieve arrays of the compound type, and also arrays of any element of the compound type.
  • The usual var/var1/vara/vars/varm functions are available for compound types.
  • Automatic data conversion for compound types is not attempted.
  • Compound types may be defined in a file, even if they are not used for any variables.
  • A string data type is supported.
  • Strings are stored in UTF-8 Unicode.
  • String data is stored without being interpreted by the library, but an encoding for Unicode strings may be specified with a separate attribute (e.g. "_Encoding"). A global or group attribute could be used to specify the encoding of all strings in a file or group.
  • A variable length (vlen) type can be used to hold ragged arrays.
  • Automatic data conversion for vlens is not attempted.
  • The user can create named opaque types, with a fixed size.
  • Automatic data conversion for opaque types is not attempted.
Hierarchical Grouping of Data
  • NetCDF-4 users can further organize their data file items (i.e. variables and attributes) into groups.
  • Groups can be nested.
  • An item can belong to only one group.
  • The ncid used by netCDF functions refers to both the file and the group within the file. Two groups in the same file will have different ncids.
  • Names are unique within a group. All netCDF-4 names (including group names) have maximum length NC_MAX_NAME. (Not including NULL terminator).
  • Users inquire about objects in a group.
  • Users can iterate through the groups of multiple simultaneously open files.
  • Attributes can be attached to groups.
  • Users can create vars and dims in groups.
  • Dims are scoped such that dims in parent, grandparent, etc., groups are available to be used as dimensions.
Limited Interoperability with HDF5
  • NetCDF-4 produces valid HDF5 files, with no special netCDF-4 artifacts.
  • NetCDF-4 can read and edit HDF5 files which meet the following conditions:
    1. Dimension scales must be used, and all dimensions of every dataset must have an attached dimension scale (except for the extra, private, dimension of a VLEN type).
    2. Group organization must be strictly hierarchical. No circular group structure is allowed.
    3. Only HDF5 atomic types which have a clear correspondence with a netCDF-4 type are supported.
    4. As long as they are don't use a forbidden atomic type, compound, vlen, and opaque types are interoperable.
    5. Object names must be valid netCDF names (i.e. alpha-numeric or "_", ".", "+", or "-". No spaces!
Compression and Other Filters
  • HDF5 deflate and compress filters are supported.
  • Compression can be applied on a per-variable basis.
Private Dimensions
  • A dimension can be marked private to one variable. Thereafter, it can only be used by that variable.
Documentation
  • A new version of the netCDF documentation includes updates to cover all new features.
  • Documents are available on the web (HTML), as PDF files, as dvi, and as postscript files.
  • Each language interface is described separately. That is, the Fortran and C manuals are not mixed together.
  • Each language interface document contains examples in native language. For example, the C manual has C examples, C++ has C++ examples, etc.
  • Each manual contains a good index which allows users to quickly find any function or concept.
  • The web site contains a search engine that will allow users to search any subset of netcdf documents.
Distribution and Installation
  • The netCDF-4 library is distributed separately from HDF5, by Unidata.
  • It is possible to configure the installation so that netcdf-4 is not built into the library. In this case, only netcdf-3 code is built, and netcdf-4 format files cannot be created or opened.
  • After the release, binary distributions are supplied for tier 1 test platforms.
  • Cross-compiling is not supported.
  • An installation document, always available on the website, and distributed with netCDF, describes the installation process and lists the supported platforms.
  • Configure and build output for tier 1 platforms are available to help troubleshoot installation problems.
  • The netCDF-4 library requires that HDF5 (version 1.8.0 or greater) be installed.
  • The netCDF-4 library can coexist peacefully with the netCDF-3 library, but both cannot be used in the same program due to name clashes.
  • Unix binary users get a tarball containing library .a file for their platform, the man pages, and binary executables of ncdump and ncgen. The Fortran interface is included in the library, and the (currently implemented) C++ or F90 interfaces for tier 1 platforms.
  • Unix users build and install netcdf-4 in one pass through the usual configure/make test/make install cycle
  • The configure script searches the user's path for compilers, preferring multi-platform commercial compilers, then platform-specific commercial compilers, then GNU compilers. (Based on the assumption that if they've paid for a compiler, and included it in their path, they want to use it).
  • The configure script attempts to correctly set flags like CPPFLAGS, CFLAGS, etc., in accordance with the needs of the platform. If the user has set CFLAGS, FFLAGS (for F77), FCFLAGS (for Fortran), or CXXFLAGS, the configure script will not override these settings. (Note that autoconf does not always follow these conventions, and we are not going to try to make it do so.) A configure option allows the user to turn off all netCDF-4 attempts to change or set any flags.
  • If no CFLAGS are specified, -g is used. (Setting CFLAGS to null means no CFLAGS will be used.)
  • By default, configure builds F77, F90, and C++ APIs, if it can find a compiler to do so. The user can optionally disable these APIs.
  • Windows binary users download a setup.exe file from Unidata and launch the GUI installer for Windows.
  • Windows source code users download a source code winzip file, and find windows dependent files and IDE configurations under the win32 directory.
  • NetCDF is buildable from MS Visual Studio, with the two most recent releases of VC++. (Version 6.0 and 7.0, for the purposes of the netCDF-4 project).
  • Windows users can also use cygwin tools to build netcdf with gcc.
  • Not supported: Building with GNU ming and using configure script to find and use MS VC.
Upgrades to ncdump/CDL to reflect new features
  • New data types, including structs, are supported in CDL and in ncdump.
  • NcML is supported in ncdump.
Testing
  • All API functions are tested.
  • All tests can run on any supported target platform.
  • All tests are automated and can be run from makefile targets.
  • Some tests make take excessive time. Make test may skip very lengthy tests (i.e. tests that take tens of minutes on the average Linux workstation), and the user can use make slowtests to run them.
  • Running "make test" or "make check" clearly indicates success or failure at the end of the output. The return code from make indicates success of all tests (0) or failure of any test (2).
  • All supported language interfaces are tested. These include C, C++, Fortran 77 and Fortran 90.
  • For release purposes we identify the following two tiers of testing:
    1. Portland Group compiler on Linux, one vendor compiler and gcc/g77/g++ on AIX, Linux, HP-UX (w/o C++,F90), MacOS, and Solaris (32-bit SPARC and i386, and 64-bit SPARC mode), and vendor compiler (i.e. VC++ 7.0, a.k.a. VC++.NET) on Windows.
    2. FreeBSD, Irix64, OSF1 vendor and GNU compilers. Cygwin.
  • We add the following for parallel programming support:
    1. AIX 5.1 and higher, Linux Cluster/MPICH
  • Test programs do not accept or require command line options.
  • Test programs provide feedback in accordance with netCDF test look and feel.
Performance
  • The performance of netCDF-4 in not more than 10% slower than netCDF-3 for large contiguous data writes. It is not more than 100% slower for any other operation.
Examples
  • Example programs will demonstrate many of the features of netCDF.
  • Each example will consist of a single code file, which the user can easily paste from the web page into their netCDF development environment.
  • Examples include reasonable error handling.
  • Examples are as brief as good coding style permits.
  • Example code is well-commented.
  • All examples can be built and run as one make target ("make check").
  • Examples are distributed with netCDF and compile and run on all supported netCDF systems.
  • Corresponding examples will be provided in C, F77, F90, and C++.
  • Examples do not depend on each other, but may depend on a common real data file.
  • Some examples illustrate netCDF features in realistic ways as used by the Earth Science community.
  • Examples are written for programmers who are familiar with the target language. The C examples are written for C programmers, not people who don't know C. The same applies to the other languages.