Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

The NetCDF Users' Guide


Next: , Previous: (dir), Up: (dir)

NetCDF Users Guide

This guide describes the netCDF object model. This document applies to netCDF version 4.0, and was last updated on 27 June 2008.

Interface guides are available for C (see The NetCDF C Interface Guide (The NetCDF C Interface Guide)), C++ (see The NetCDF C++ Interface Guide (The NetCDF C++ Interface Guide)), Fortran 77 (see The NetCDF Fortran 77 Interface Guide (The NetCDF Fortran 77 Interface Guide)), and Fortran 90 (see The NetCDF Fortran 90 Interface Guide (The NetCDF Fortran 90 Interface Guide)).

Separate documentation for the netCDF Java library can be found at the netCDF-Java website, http://www.unidata.ucar.edu/software/netcdf-java.

For installation and porting information, see The NetCDF Installation and Porting Guide (The NetCDF Installation and Porting Guide).

--- The Detailed Node Listing ---

Introduction

Components of a NetCDF Dataset

Data

Forms of Data Access

File Structure and Performance

NetCDF Utilities

File Format Specification

The NetCDF Classic Format Specification


Next: , Previous: Top, Up: Top

Foreword

Unidata (http://www.unidata.ucar.edu) is a National Science Foundation-sponsored program empowering U.S. universities, through innovative applications of computers and networks, to make the best use of atmospheric and related data for enhancing education and research. For analyzing and displaying such data, the Unidata Program Center offers universities several supported software packages developed by other organizations. Underlying these is a Unidata-developed system for acquiring and managing data in real time, making practical the Unidata principle that each university should acquire and manage its own data holdings as local requirements dictate. It is significant that the Unidata program has no data center–the management of data is a "distributed" function.

The Network Common Data Form (netCDF) software described in this guide was originally intended to provide a common data access method for the various Unidata applications. These deal with a variety of data types that encompass single-point observations, time series, regularly-spaced grids, and satellite or radar images.

The netCDF software functions as an I/O library, callable from C, FORTRAN, C++, Perl, or other language for which a netCDF library is available. The library stores and retrieves data in self-describing, machine-independent datasets. Each netCDF dataset can contain multidimensional, named variables (with differing types that include integers, reals, characters, bytes, etc.), and each variable may be accompanied by ancillary data, such as units of measure or descriptive text. The interface includes a method for appending data to existing netCDF datasets in prescribed ways, functionality that is not unlike a (fixed length) record structure. However, the netCDF library also allows direct-access storage and retrieval of data by variable name and index and therefore is useful only for disk-resident (or memory-resident) datasets.

NetCDF access has been implemented in about half of Unidata's software, so far, and it is planned that such commonality will extend across all Unidata applications in order to:

A measure of success has been achieved. NetCDF is now in use on computing platforms that range from personal computers to supercomputers and include most UNIX-based workstations. It can be used to create a complex dataset on one computer (say in FORTRAN) and retrieve that same self-describing dataset on another computer (say in C) without intermediate translations–netCDF datasets can be transferred across a network, or they can be accessed remotely using a suitable network file system or remote access protocols.

Because we believe that the use of netCDF access in non-Unidata software will benefit Unidata's primary constituency–such use may result in more options for analyzing and displaying Unidata information–the netCDF library is distributed without licensing or other significant restrictions, and current versions can be obtained via anonymous FTP. Apparently the software has been well received by a wide range of institutions beyond the atmospheric science community, and a substantial number of public domain and commercial data analysis systems can now accept netCDF datasets as input.

Several organizations have adopted netCDF as a data access standard, and there is an effort underway at the National Center for Supercomputer Applications (NCSA, which is associated with the University of Illinois at Urbana-Champaign) to support the netCDF programming interfaces as a means to store and retrieve data in "HDF files," i.e., in the format used by the popular NCSA tools. We have encouraged and cooperated with these efforts.

Questions occasionally arise about the level of support provided for the netCDF software. Unidata's formal position, stated in the copyright notice which accompanies the netCDF library, is that the software is provided "as is". In practice, the software is updated from time to time, and Unidata intends to continue making improvements for the foreseeable future. Because Unidata's mission is to serve geoscientists at U.S. universities, problems reported by that community necessarily receive the greatest attention.

We hope the reader will find the software useful and will give us feedback on its application as well as suggestions for its improvement.

David Fulker, 1996

Unidata Program Center Director, University Corporation for Atmospheric Research


Next: , Previous: Foreword, Up: Top

Summary

The purpose of the Network Common Data Form (netCDF) interface is to allow you to create, access, and share array-oriented data in a form that is self-describing and portable. "Self-describing" means that a dataset includes information defining the data it contains. "Portable" means that the data in a dataset is represented in a form that can be accessed by computers with different ways of storing integers, characters, and floating-point numbers. Using the netCDF interface for creating new datasets makes the data portable. Using the netCDF interface in software for data access, management, analysis, and display can make the software more generally useful.

The netCDF software includes C, Fortran 77, Fortran 90, and C++ interfaces for accessing netCDF data. These libraries are available for many common computing platforms.

The community of netCDF users has contributed ports of the software to additional platforms and interfaces for other programming languages as well. Source code for netCDF software libraries is freely available to encourage the sharing of both array-oriented data and the software that makes the data useful.

This User's Guide presents the netCDF data model. It explains how the netCDF data model uses dimensions, variables, and attributes to store data. Language specific programming guides are available for C (see The NetCDF C Interface Guide (The NetCDF C Interface Guide)), C++ (see The NetCDF C++ Interface Guide (The NetCDF C++ Interface Guide)), Fortran 77 (see The NetCDF Fortran 77 Interface Guide (The NetCDF Fortran 77 Interface Guide)), and Fortran 90 (see The NetCDF Fortran 90 Interface Guide (The NetCDF Fortran 90 Interface Guide)).

Reference documentation for UNIX systems, in the form of UNIX 'man' pages for the C and FORTRAN interfaces is also available at the netCDF web site (http://www.unidata.ucar.edu/software/netcdf), and with the netCDF distribution.

The latest version of this document, and the language specific guides, can be found at the netCDF web site, http://www.unidata.ucar.edu/software/netcdf/docs, along with extensive additional information about netCDF, including pointers to other software that works with netCDF data.

Separate documentation of the Java netCDF library can be found at http://www.unidata.ucar.edu/software/netcdf-java.

For installation and porting information See The NetCDF Installation and Porting Guide (The NetCDF Installation and Porting Guide).


Next: , Previous: Summary, Up: Top

1 Introduction


Next: , Previous: Introduction, Up: Introduction

1.1 The NetCDF Interface

The Network Common Data Form, or netCDF, is an interface to a library of data access functions for storing and retrieving data in the form of arrays. An array is an n-dimensional (where n is 0, 1, 2, ...) rectangular structure containing items which all have the same data type (e.g., 8-bit character, 32-bit integer). A scalar (simple single value) is a 0-dimensional array.

NetCDF is an abstraction that supports a view of data as a collection of self-describing, portable objects that can be accessed through a simple interface. Array values may be accessed directly, without knowing details of how the data are stored. Auxiliary information about the data, such as what units are used, may be stored with the data. Generic utilities and application programs can access netCDF datasets and transform, combine, analyze, or display specified fields of the data. The development of such applications has led to improved accessibility of data and improved re-usability of software for array-oriented data management, analysis, and display.

The netCDF software implements an abstract data type, which means that all operations to access and manipulate data in a netCDF dataset must use only the set of functions provided by the interface. The representation of the data is hidden from applications that use the interface, so that how the data are stored could be changed without affecting existing programs. The physical representation of netCDF data is designed to be independent of the computer on which the data were written.

Unidata supports the netCDF interfaces for C, (see Top (The NetCDF C Interface Guide)), FORTRAN 77 (see Top (The NetCDF Fortran 77 Interface Guide)), FORTRAN 90 (see Top (The NetCDF Fortran 90 Interface Guide)), and C++ (see Top (The NetCDF C++ Interface Guide)).

The netCDF library is supported for various UNIX operating systems. A MS Windows port is also available. The software is also ported and tested on a few other operating systems, with assistance from users with access to these systems, before each major release. Unidata's netCDF software is freely available via FTP to encourage its widespread use. (ftp://ftp.unidata.ucar.edu/pub/netcdf).

For detailed installation instructions, see the Porting and Installation Guide. See Top (The NetCDF Installation and Porting Guide).


Next: , Previous: Interface, Up: Introduction

1.2 NetCDF Is Not a Database Management System

Why not use an existing database management system for storing array-oriented data? Relational database software is not suitable for the kinds of data access supported by the netCDF interface.

First, existing database systems that support the relational model do not support multidimensional objects (arrays) as a basic unit of data access. Representing arrays as relations makes some useful kinds of data access awkward and provides little support for the abstractions of multidimensional data and coordinate systems. A quite different data model is needed for array-oriented data to facilitate its retrieval, modification, mathematical manipulation and visualization.

Related to this is a second problem with general-purpose database systems: their poor performance on large arrays. Collections of satellite images, scientific model outputs and long-term global weather observations are beyond the capabilities of most database systems to organize and index for efficient retrieval.

Finally, general-purpose database systems provide, at significant cost in terms of both resources and access performance, many facilities that are not needed in the analysis, management, and display of array-oriented data. For example, elaborate update facilities, audit trails, report formatting, and mechanisms designed for transaction-processing are unnecessary for most scientific applications.


Next: , Previous: Not DBMS, Up: Introduction

1.3 The netCDF File Format

Until version 3.6.0, all versions of netCDF employed only one binary data format, now referred to as netCDF classic format. NetCDF classic is the default format for all versions of netCDF.

In version 3.6.0 a new binary format was introduced, 64-bit offset format. Nearly identical to netCDF classic format, it uses 64-bit offsets (hence the name), and allows users to create far larger datasets.

In version 4.0.0 a third binary format was introduced: the HDF5 format. Starting with this version, the netCDF library can use HDF5 files as it's base format. (Only HDF5 files created with netCDF-4 can be understood by netCDF-4).

By default, netCDF uses the classic format. To use the 64-bit offset or netCDF-4/HDF5 format, set the appropriate constant when creating the file.

To achieve network-transparency (machine-independence), netCDF classic and 64-bit offset formats are implemented in terms of an external representation much like XDR (eXternal Data Representation, see http://www.ietf.org/rfc/rfc1832.txt), a standard for describing and encoding data. This representation provides encoding of data into machine-independent sequences of bits. It has been implemented on a wide variety of computers, by assuming only that eight-bit bytes can be encoded and decoded in a consistent way. The IEEE 754 floating-point standard is used for floating-point data representation.

Descriptions of the overall structure of netCDF classic and 64-bit offset files are provided later in this manual. See Structure.

The details of the classic and 64-bit offset formats are described in an appendix. See File Format. However, users are discouraged from using the format specification to develop independent low-level software for reading and writing netCDF files, because this could lead to compatibility problems if the format is ever modified.


Next: , Previous: Format, Up: Introduction

1.4 How to Select the Format

With three different base formats, care must be taken in creating data files to choose the correct base format.

The format of a netCDF file is determined at create time.

When opening an existing netCDF file the netCDF library will transparently detect its format and adjust accordingly. However, netCDF library versions earlier than 3.6.0 cannot read 64-bit offset format files, and library versions before 4.0 can't read netCDF-4/HDF5 files. NetCDF classic format files (even if created by version 3.6.0 or later) remain compatible with older versions of the netCDF library.

Users are encouraged to use netCDF classic format to distribute data, for maximum portability.

To select 64-bit offset or netCDF-4 format files, C programmers should use flag NC_64BIT_OFFSET or NC_NETCDF4 in function nc_create. See nc_create (The NetCDF C Interface Guide).

In Fortran, use flag nf_64bit_offset or nf_format_netcdf4 in function NF_CREATE. See NF_CREATE (The NetCDF Fortran 77 Interface Guide).

It is also possible to change the default creation format, to convert a large body of code without changing every create call. C programmers see nc_set_default_format (The NetCDF C Interface Guide). Fortran programs see NF_SET_DEFAULT_FORMAT (The NetCDF Fortran 77 Interface Guide).

1.4.1 NetCDF Classic Format

The original netCDF format is identified using four bytes in the file header. All files in this format have “CDF\001” at the beginning of the file. In this documentation this format is referred to as “netCDF classic format.”

NetCDF classic format is identical to the format used by every previous version of netCDF. It has maximum portability, and is still the default netCDF format.

For some users, the various 2 GiB format limitations of the classic format become a problem. (see Classic Limitations).

1.4.2 NetCDF 64-bit Offset Format

For these users, 64-bit offset format is a natural choice. It greatly eases the size restrictions of netCDF classic files (see 64 bit Offset Limitations).

Files with the 64-bit offsets are identified with a “CDF\002” at the beginning of the file. In this documentation this format is called “64-bit offset format.”

Since 64-bit offset format was introduced in version 3.6.0, earlier versions of the netCDF library can't read 64-bit offset files.

1.4.3 NetCDF-4 Format

In version 4.0, netCDF included another new underlying format: HDF5.

NetCDF-4 format files offer new features such as groups, compound types, variable length arrays, new unsigned integer types, parallel I/O access, etc. None of these new features can be used with classic or 64-bit offset files.

NetCDF-4 files can't be created at all, unless the netCDF configure script is run with –enable-netcdf-4. This also requires version 1.8.0 of HDF5.

For the netCDF-4.0 release, netCDF-4 features are only available from the C and Fortran interfaces. We plan to bring netCDF-4 features to the CXX API in a future release of netCDF.

NetCDF-4 files can't be read by any version of the netCDF library previous to 4.0. (But they can be read by HDF5, version 1.8.0 or better).

For more discussion of format issues see The NetCDF Tutorial (The NetCDF Tutorial).


Next: , Previous: Which Format, Up: Introduction

1.5 What about Performance?

One of the goals of netCDF is to support efficient access to small subsets of large datasets. To support this goal, netCDF uses direct access rather than sequential access. This can be much more efficient when the order in which data is read is different from the order in which it was written, or when it must be read in different orders for different applications.

The amount of overhead for a portable external representation depends on many factors, including the data type, the type of computer, the granularity of data access, and how well the implementation has been tuned to the computer on which it is run. This overhead is typically small in comparison to the overall resources used by an application. In any case, the overhead of the external representation layer is usually a reasonable price to pay for portable data access.

Although efficiency of data access has been an important concern in designing and implementing netCDF, it is still possible to use the netCDF interface to access data in inefficient ways: for example, by requesting a slice of data that requires a single value from each record. Advice on how to use the interface efficiently is provided in Structure.

The use of HDF5 as a data format adds significant overhead in metadata operations, less so in data access operations. We continue to study the challenge of implementing netCDF-4/HDF5 format without compromising performance.


Next: , Previous: Performance, Up: Introduction

1.6 Is NetCDF a Good Archive Format?

NetCDF classic or 64-bit offset formats can be used as a general-purpose archive format for storing arrays. Compression of data is possible with netCDF (e.g., using arrays of eight-bit or 16-bit integers to encode low-resolution floating-point numbers instead of arrays of 32-bit numbers), or the resulting data file may be compressed before storage (but must be uncompressed before it is read). Hence, using these netCDF formats may require more space than special-purpose archive formats that exploit knowledge of particular characteristics of specific datasets.

With netCDF-4/HDF5 format, the zlib library can provide compression on a per-variable basis. That is, some variables may be compressed, others not. In this case the compression and decompression of data happen transparently to the user, and the data may be stored, read, and written compressed.


Next: , Previous: Archival, Up: Introduction

1.7 Creating Self-Describing Data conforming to Conventions

The mere use of netCDF is not sufficient to make data "self-describing" and meaningful to both humans and machines. The names of variables and dimensions should be meaningful and conform to any relevant conventions. Dimensions should have corresponding coordinate variables where sensible.

Attributes play a vital role in providing ancillary information. It is important to use all the relevant standard attributes using the relevant conventions. For a description of reserved attributes (used by the netCDF library) and attribute conventions for generic application software, see Attribute Conventions.

A number of groups have defined their own additional conventions and styles for netCDF data. Descriptions of these conventions, as well as examples incorporating them can be accessed from the netCDF Conventions site, http://www.unidata.ucar.edu/software/netcdfconventions.html.

These conventions should be used where suitable. Additional conventions are often needed for local use. These should be contributed to the above netCDF conventions site if likely to interest other users in similar areas.


Next: , Previous: Conventions, Up: Introduction

1.8 Background and Evolution of the NetCDF Interface

The development of the netCDF interface began with a modest goal related to Unidata's needs: to provide a common interface between Unidata applications and real-time meteorological data. Since Unidata software was intended to run on multiple hardware platforms with access from both C and FORTRAN, achieving Unidata's goals had the potential for providing a package that was useful in a broader context. By making the package widely available and collaborating with other organizations with similar needs, we hoped to improve the then current situation in which software for scientific data access was only rarely reused by others in the same discipline and almost never reused between disciplines (Fulker, 1988).

Important concepts employed in the netCDF software originated in a paper (Treinish and Gough, 1987) that described data-access software developed at the NASA Goddard National Space Science Data Center (NSSDC). The interface provided by this software was called the Common Data Format (CDF). The NASA CDF was originally developed as a platform-specific FORTRAN library to support an abstraction for storing arrays.

The NASA CDF package had been used for many different kinds of data in an extensive collection of applications. It had the virtues of simplicity (only 13 subroutines), independence from storage format, generality, ability to support logical user views of data, and support for generic applications.

Unidata held a workshop on CDF in Boulder in August 1987. We proposed exploring the possibility of collaborating with NASA to extend the CDF FORTRAN interface, to define a C interface, and to permit the access of data aggregates with a single call, while maintaining compatibility with the existing NASA interface.

Independently, Dave Raymond at the New Mexico Institute of Mining and Technology had developed a package of C software for UNIX that supported sequential access to self-describing array-oriented data and a "pipes and filters" (or "data flow") approach to processing, analyzing, and displaying the data. This package also used the "Common Data Format" name, later changed to C-Based Analysis and Display System (CANDIS). Unidata learned of Raymond's work (Raymond, 1988), and incorporated some of his ideas, such as the use of named dimensions and variables with differing shapes in a single data object, into the Unidata netCDF interface.

In early 1988, Glenn Davis of Unidata developed a prototype netCDF package in C that was layered on XDR. This prototype proved that a single-file, XDR-based implementation of the CDF interface could be achieved at acceptable cost and that the resulting programs could be implemented on both UNIX and VMS systems. However, it also demonstrated that providing a small, portable, and NASA CDF-compatible FORTRAN interface with the desired generality was not practical. NASA's CDF and Unidata's netCDF have since evolved separately, but recent CDF versions share many characteristics with netCDF.

In early 1988, Joe Fahle of SeaSpace, Inc. (a commercial software development firm in San Diego, California), a participant in the 1987 Unidata CDF workshop, independently developed a CDF package in C that extended the NASA CDF interface in several important ways (Fahle, 1989). Like Raymond's package, the SeaSpace CDF software permitted variables with unrelated shapes to be included in the same data object and permitted a general form of access to multidimensional arrays. Fahle's implementation was used at SeaSpace as the intermediate form of storage for a variety of steps in their image-processing system. This interface and format have subsequently evolved into the Terascan data format.

After studying Fahle's interface, we concluded that it solved many of the problems we had identified in trying to stretch the NASA interface to our purposes. In August 1988, we convened a small workshop to agree on a Unidata netCDF interface, and to resolve remaining open issues. Attending were Joe Fahle of SeaSpace, Michael Gough of Apple (an author of the NASA CDF software), Angel Li of the University of Miami (who had implemented our prototype netCDF software on VMS and was a potential user), and Unidata systems development staff. Consensus was reached at the workshop after some further simplifications were discovered. A document incorporating the results of the workshop into a proposed Unidata netCDF interface specification was distributed widely for comments before Glenn Davis and Russ Rew implemented the first version of the software. Comparison with other data-access interfaces and experience using netCDF are discussed in Rew and Davis (1990a), Rew and Davis (1990b), Jenter and Signell (1992), and Brown, Folk, Goucher, and Rew (1993).

In October 1991, we announced version 2.0 of the netCDF software distribution. Slight modifications to the C interface (declaring dimension lengths to be long rather than int) improved the usability of netCDF on inexpensive platforms such as MS-DOS computers, without requiring recompilation on other platforms. This change to the interface required no changes to the associated file format.

Release of netCDF version 2.3 in June 1993 preserved the same file format but added single call access to records, optimizations for accessing cross-sections involving non-contiguous data, subsampling along specified dimensions (using 'strides'), accessing non-contiguous data (using 'mapped array sections'), improvements to the ncdump and ncgen utilities, and an experimental C++ interface.

In version 2.4, released in February 1996, support was added for new platforms and for the C++ interface, significant optimizations were implemented for supercomputer architectures, and the file format was formally specified in an appendix to the User's Guide.

FAN (File Array Notation), software providing a high-level interface to netCDF data, was made available in May 1996. The capabilities of the FAN utilities include extracting and manipulating array data from netCDF datasets, printing selected data from netCDF arrays, copying ASCII data into netCDF arrays, and performing various operations (sum, mean, max, min, product, and others) on netCDF arrays.

In 1996 and 1997, Joe Sirott implemented and made available the first implementation of a read-only netCDF interface for Java, Bill Noon made a Python module available for netCDF, and Konrad Hinsen contributed another netCDF interface for Python.

In May 1997, Version 3.3 of netCDF was released. This included a new type-safe interface for C and Fortran, as well as many other improvements. A month later, Charlie Zender released version 1.0 of the NCO (netCDF Operators) package, providing command-line utilities for general purpose operations on netCDF data.

Version 3.4 of Unidata's netCDF software, released in March 1998, included initial large file support, performance enhancements, and improved Cray platform support. Later in 1998, Dan Schmitt provided a Tcl/Tk interface, and Glenn Davis provided version 1.0 of netCDF for Java.

In May 1999, Glenn Davis, who was instrumental in creating and developing netCDF, died in a small plane crash during a thunderstorm. The memory of Glenn's passions and intellect continue to inspire those of us who worked with him.

In February 2000, an experimental Fortran 90 interface developed by Robert Pincus was released.

John Caron released netCDF for Java, version 2.0 in February 2001. This version incorporated a new high-performance package for multidimensional arrays, simplified the interface, and included OpenDAP (known previously as DODS) remote access, as well as remote netCDF access via HTTP contributed by Don Denbo.

In March 2001, NetCDF 3.5.0 was released. This release fully integrated the new Fortran 90 interface, enhanced portability, improved the C++ interface, and added a few new tuning functions.

Also in 2001, Takeshi Horinouchi and colleagues made a netCDF interface for Ruby available, as did David Pierce for the R language for statistical computing and graphics. Charles Denham released WetCDF, an independent implementation of the netCDF interface for Matlab, as well as updates to the popular netCDF Toolbox for Matlab.

In 2002, Unidata and collaborators developed NcML, an XML representation for netCDF data useful for cataloging data holdings, aggregation of data from multiple datasets, augmenting metadata in existing datasets, and support for alternative views of data. The Java interface currently provides access to netCDF data through NcML.

Additional developments in 2002 included translation of C and Fortran User Guides into Japanese by Masato Shiotani and colleagues, creation of a “Best Practices” guide for writing netCDF files, and provision of an Ada-95 interface by Alexandru Corlan.

In July 2003 a group of researchers at Northwestern University and Argonne National Laboratory (Jianwei Li, Wei-keng Liao, Alok Choudhary, Robert Ross, Rajeev Thakur, William Gropp, and Rob Latham) contributed a new parallel interface for writing and reading netCDF data, tailored for use on high performance platforms with parallel I/O. The implementation built on the MPI-IO interface, providing portability to many platforms.

In October 2003, Greg Sjaardema contributed support for an alternative format with 64-bit offsets, to provide more complete support for very large files. These changes, with slight modifications at Unidata, were incorporated into version 3.6.0, released in December, 2004.

In 2004, thanks to a NASA grant, Unidata and NCSA began a collaboration to increase the interoperability of netCDF and HDF5, and bring some advanced HDF5 features to netCDF users.

In February, 2006, release 3.6.1 fixed some minor bugs.

In March, 2007, release 3.6.2 introduced an improved build system that used automake and libtool, and an upgrade to the most recent autoconf release, to support shared libraries and the netcdf-4 builds. This release also introduced the NetCDF Tutorial and example programs.

The first beta release of netCDF-4.0 was celebrated with a giant party at Unidata in April, 2007. Over 2000 people danced 'til dawn at the NCAR Mesa Lab, listening to the Flaming Lips and the Denver Gilbert & Sullivan repertory company.

In June, 2008, netCDF-4.0 was released. Version 3.6.3, the same code but with netcdf-4 features turned off, was released at the same time. The 4.0 release uses HDF5 1.8.1 as the data storage layer for netcdf, and introduces many new features including groups and user-defined types. The 3.6.3/4.0 releases also introduced handling of UTF8 names.


Next: , Previous: Background, Up: Introduction

1.9 What's New Since the Previous Release?

This Guide documents the 4.0 release of netCDF, which introduces a new storage format, netCDF-4/HDF5, while maintaining full backward compatibility.

New features available with netCDF-4/HDF5 files include:

More information about netCDF-4 can be found at the netCDF-4 web page http://www.unidata.ucar.edu/software/netcdf/netcdf-4.


Next: , Previous: Whats New, Up: Introduction

1.10 Limitations of NetCDF

The netCDF data model is widely applicable to data that can be organized into a collection of named array variables with named attributes, but there are some important limitations to the model and its implementation in software. Some of these limitations have been removed or relaxed in netCDF-4 files, but still apply to netCDF classic and netCDF 64-bit offset files.

Currently, netCDF classic and 64-bit offset formats offer a limited number of external numeric data types: 8-, 16-, 32-bit integers, or 32- or 64-bit floating-point numbers. (The netCDF-4 format adds 64-bit integer types and unsigned integer types.) This limited set of sizes may use file space inefficiently compared to packing data in bit fields. For example, arrays of 9-bit values must be stored in 16-bit short integers. Storing arrays of 1- or 2-bit values in 8-bit values is even less optimal.

With the netCDF-4/HDF5 format, new unsigned integers (of various sizes), 64-bit integers, and the string type allow greater expression of scientific data. The new VLEN and COMPOUND types allow users to organize data in new ways.

With the classic netCDF file format, there are constraints that limit how a dataset is structured to store more than 2 GiBytes (2^30 or 1,073,741,824 bytes, as compared to a Gbyte, which is 1,000,000,000 bytes.) of data in a single netCDF dataset. (see Classic Limitations). This limitation is a result of 32-bit offsets used for storing relative offsets within a classic netCDF format file. Since one of the goals of netCDF is portable data and some computing platforms still can't deal with files larger than 2 GiB, it is best to keep files that must be portable below this limit. Nevertheless, it is possible to create and access netCDF files larger than 2 GiB on platforms that provide support for such files (see Large File Support).

The new 64-bit offset format allows large files, and makes it easy to create to create fixed variables of about 4 GiB, and record variables of about 4 GiB per record. (see 64 bit Offset Limitations). However, old netCDF applications will not be able to read the 64-bit offset files until they are upgraded to at least version 3.6.0 of netCDF (i.e. the version in which 64-bit offset format was introduced).

With the netCDF-4/HDF5 format size limitations are further relaxed, and files can be as large as the underlying file system supports. NetCDF-4/HDF5 files are unreadable to the netCDF library before version 4.0.

Another limitation of the classic (and 64-bit offset) model is that only one unlimited (changeable) dimension is permitted for each netCDF data set. Multiple variables can share an unlimited dimension, but then they must all grow together. Hence the classic netCDF model does not permit variables with several unlimited dimensions or the use of multiple unlimited dimensions in different variables within the same dataset. Variables that have non-rectangular shapes (for example, ragged arrays) cannot be represented conveniently.

In netCDF-4/HDF5 files, multiple unlimited dimensions are fully supported. Any variable can be defined with any combination of limited and unlimited dimensions.

The extent to which data can be completely self-describing is limited: there is always some assumed context without which sharing and archiving data would be impractical. NetCDF permits storing meaningful names for variables, dimensions, and attributes; units of measure in a form that can be used in computations; text strings for attribute values that apply to an entire data set; and simple kinds of coordinate system information. But for more complex kinds of metadata (for example, the information necessary to provide accurate georeferencing of data on unusual grids or from satellite images), it is often necessary to develop conventions.

Specific additions to the netCDF data model might make some of these conventions unnecessary or allow some forms of metadata to be represented in a uniform and compact way. For example, adding explicit georeferencing to the netCDF data model would simplify elaborate georeferencing conventions at the cost of complicating the model. The problem is finding an appropriate trade-off between the richness of the model and its generality (i.e., its ability to encompass many kinds of data). A data model tailored to capture the shared context among researchers within one discipline may not be appropriate for sharing or combining data from multiple disciplines.

The classic netCDF data model does not support nested data structures such as trees, nested arrays, or other recursive structures. (This limitation also applies to 64-bit offset files.) Through use of indirection and conventions it is possible to represent some kinds of nested structures, but the result may fall short of the netCDF goal of self-describing data.

In netCDF-4/HDF5 format files, the introduction of the compound type allows the creation of complex data types, involving any combination of types. The VLEN type allows efficient storage of ragged arrays, and the introduction of hierarchical groups allows users to organize data.

Finally, for classic and 64-bit offset files, concurrent access to a netCDF dataset is limited. One writer and multiple readers may access data in a single dataset simultaneously, but there is no support for multiple concurrent writers.

NetCDF-4 supports parallel read/write access to netCDF-4/HDF5 files, using the underlying HDF5 library.

For more information about HDF5, see the HDF5 web site: http://hdfgroup.org/HDF5/.


Next: , Previous: Limitations, Up: Introduction

1.11 Plans for NetCDF

Future versions of NetCDF will include the following features:

  1. Remote access of netCDF data via OpenDAP servers.
  2. Extensions of netCDF-4 features to C++ API and to tools ncgen/ncdump.
  3. Better documentation and more examples.


Previous: Future, Up: Introduction

1.12 References

  1. Brown, S. A, M. Folk, G. Goucher, and R. Rew, "Software for Portable Scientific Data Management," Computers in Physics, American Institute of Physics, Vol. 7, No. 3, May/June 1993.
  2. Davies, H. L., "FAN - An array-oriented query language," Second Workshop on Database Issues for Data Visualization (Visualization 1995), Atlanta, Georgia, IEEE, October 1995.
  3. Fahle, J., TeraScan Applications Programming Interface, SeaSpace, San Diego, California, 1989.
  4. Fulker, D. W., "The netCDF: Self-Describing, Portable Files—a Basis for 'Plug-Compatible' Software Modules Connectable by Networks," ICSU Workshop on Geophysical Informatics, Moscow, USSR, August 1988.
  5. Fulker, D. W., "Unidata Strawman for Storing Earth-Referencing Data," Seventh International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, New Orleans, La., American Meteorology Society, January 1991.
  6. Gough, M. L., NSSDC CDF Implementer's Guide (DEC VAX/VMS) Version 1.1, National Space Science Data Center, 88-17, NASA/Goddard Space Flight Center, 1988.
  7. Jenter, H. L. and R. P. Signell, "NetCDF: A Freely-Available Software-Solution to Data-Access Problems for Numerical Modelers," Proceedings of the American Society of Civil Engineers Conference on Estuarine and Coastal Modeling, Tampa, Florida, 1992.
  8. Raymond, D. J., "A C Language-Based Modular System for Analyzing and Displaying Gridded Numerical Data," Journal of Atmospheric and Oceanic Technology, 5, 501-511, 1988.
  9. Rew, R. K. and G. P. Davis, "The Unidata netCDF: Software for Scientific Data Access," Sixth International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, Anaheim, California, American Meteorology Society, February 1990.
  10. Rew, R. K. and G. P. Davis, "NetCDF: An Interface for Scientific Data Access," Computer Graphics and Applications, IEEE, pp. 76-82, July 1990.
  11. Rew, R. K. and G. P. Davis, "Unidata's netCDF Interface for Data Access: Status and Plans," Thirteenth International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, Anaheim, California, American Meteorology Society, February 1997.
  12. Treinish, L. A. and M. L. Gough, "A Software Package for the Data Independent Management of Multi-Dimensional Data," EOS Transactions, American Geophysical Union, 68, 633-635, 1987.


Next: , Previous: Introduction, Up: Top

2 Components of a NetCDF Dataset


Next: , Previous: Dataset Components, Up: Dataset Components

2.1 The NetCDF Data Model

A netCDF dataset contains dimensions, variables, and attributes, which all have both a name and an ID number by which they are identified. These components can be used together to capture the meaning of data and relations among data fields in an array-oriented dataset. The netCDF library allows simultaneous access to multiple netCDF datasets which are identified by dataset ID numbers, in addition to ordinary file names.

2.1.1 Expanded Model in NetCDF-4 Files

Files created with the netCDF-4 format have access to an expanded data model, which includes named groups. Groups, like directories in a Unix file system, are hierarchically organized, to arbitrary depth. They can be used to organize large numbers of variables.

Each group acts as an entire netCDF dataset in the classic model. That is, each group may have attributes, dimensions, and variables, as well as other groups.

The default root is the root group, which allows the classic netCDF data model to fit neatly into the new model.

Dimensions are scoped such that they can be seen in all descendant groups. That is, dimensions can be shared between variables in different groups, if they are defined in a parent group.

In netCDF-4 files, the user may also define a type. For example a compound type may hold information from an array of C structures, or a variable length array allows the user to read and write arrays of variable length arrays.

Variables, groups, and types share a namespace. Within the same group, a variable, groups, and types must have unique names. (That is, a type and variable may not have the same name within the same group, and similarly for sub-groups of that group.)

Groups and user defined types are only available in files created in the NetCDF-4/HDF5 format. They are not available for classic or 64-bit offset format files.

2.1.2 Naming Conventions

The names of dimensions, variables and attributes (and, in netCDF-4 files, groups, user-defined types, compound member names, and enumeration symbols) consist of arbitrary sequences of alphanumeric characters, underscore '_', period '.', plus '+', hyphen '-', or at sign '@', but beginning with a letter or underscore. However names commencing with underscore are reserved for system use. Case is significant in netCDF names. A zero-length name is not allowed. Some widely used conventions restrict names to only alphanumeric characters or underscores. NetCDF-4 permits UTF-8 encoded Unicode characters in names, as well as other special characters.

2.1.3 Network Common Data Form Language (CDL)

We will use a small netCDF example to illustrate the concepts of the netCDF data model. This includes dimensions, variables, and attributes. The notation used to describe this simple netCDF object is called CDL (network Common Data form Language), which provides a convenient way of describing netCDF datasets. The netCDF system includes the ncdump utility for producing human-oriented CDL text files from binary netCDF datasets and vice versa. (The ncdump utility has recently been enhanced to accommodate netCDF-4 features in the CDL output, but the example here is restricted to netCDF-3 CDL.)

     netcdf example_1 {  // example of CDL notation for a netCDF dataset
     
     dimensions:         // dimension names and lengths are declared first
             lat = 5, lon = 10, level = 4, time = unlimited;
     
     variables:          // variable types, names, shapes, attributes
             float   temp(time,level,lat,lon);
                         temp:long_name     = "temperature";
                         temp:units         = "celsius";
             float   rh(time,lat,lon);
                         rh:long_name = "relative humidity";
                         rh:valid_range = 0.0, 1.0;      // min and max
             int     lat(lat), lon(lon), level(level);
                         lat:units       = "degrees_north";
                         lon:units       = "degrees_east";
                         level:units     = "millibars";
             short   time(time);
                         time:units      = "hours since 1996-1-1";
             // global attributes
                         :source = "Fictional Model Output";
     
     data:                // optional data assignments
             level   = 1000, 850, 700, 500;
             lat     = 20, 30, 40, 50, 60;
             lon     = -160,-140,-118,-96,-84,-52,-45,-35,-25,-15;
             time    = 12;
             rh      =.5,.2,.4,.2,.3,.2,.4,.5,.6,.7,
                      .1,.3,.1,.1,.1,.1,.5,.7,.8,.8,
                      .1,.2,.2,.2,.2,.5,.7,.8,.9,.9,
                      .1,.2,.3,.3,.3,.3,.7,.8,.9,.9,
                       0,.1,.2,.4,.4,.4,.4,.7,.9,.9;
     }

The CDL notation for a netCDF dataset can be generated automatically by using ncdump, a utility program described later (see ncdump). Another netCDF utility, ncgen, generates a netCDF dataset (or optionally C or FORTRAN source code containing calls needed to produce a netCDF dataset) from CDL input (see ncgen).

The CDL notation is simple and largely self-explanatory. It will be explained more fully as we describe the components of a netCDF dataset. For now, note that CDL statements are terminated by a semicolon. Spaces, tabs, and newlines can be used freely for readability. Comments in CDL follow the characters '//' on any line. A CDL description of a netCDF dataset takes the form

       netCDF name {
         dimensions: ...
         variables: ...
         data: ...
       }

where the name is used only as a default in constructing file names by the ncgen utility. The CDL description consists of three optional parts, introduced by the keywords dimensions, variables, and data. NetCDF dimension declarations appear after the dimensions keyword, netCDF variables and attributes are defined after the variables keyword, and variable data assignments appear after the data keyword.

The ncgen utility provides a command line option which indicates the desired output format. Limitations are enforced for the selected format - that is, some CDL files may be expressible only in 64-bit offset or NetCDF-4 format.

For example, trying to create a file with very large variables in classic format may result in an error because size limits are violated.


Next: , Previous: Data Model, Up: Dataset Components

2.2 Dimensions

A dimension may be used to represent a real physical dimension, for example, time, latitude, longitude, or height. A dimension might also be used to index other quantities, for example station or model-run-number.

A netCDF dimension has both a name and a length.

A dimension length is an arbitrary positive integer, except that one dimension in a classic or 64-bit offset netCDF dataset can have the length UNLIMITED. In a netCDF-4 dataset, any number of unlimited dimensions can be used.

Such a dimension is called the unlimited dimension or the record dimension. A variable with an unlimited dimension can grow to any length along that dimension. The unlimited dimension index is like a record number in conventional record-oriented files.

A netCDF classic or 64-bit offset dataset can have at most one unlimited dimension, but need not have any. If a variable has an unlimited dimension, that dimension must be the most significant (slowest changing) one. Thus any unlimited dimension must be the first dimension in a CDL shape and the first dimension in corresponding C array declarations.

A netCDF-4 dataset may have multiple unlimited dimensions, and there are no restrictions on their order in the list of a variables dimensions.

To grow variables along an unlimited dimension, write the data using any of the netCDF data writing functions, and specify the index of the unlimited dimension to the desired record number. The netCDF library will write however many records are needed (using the fill value, unless that feature is turned off, to fill in any intervening records).

CDL dimension declarations may appear on one or more lines following the CDL keyword dimensions. Multiple dimension declarations on the same line may be separated by commas. Each declaration is of the form name = length. Use the “/” character to include group information (netCDF-4 output only).

There are four dimensions in the above example: lat, lon, level, and time (see Data Model). The first three are assigned fixed lengths; time is assigned the length UNLIMITED, which means it is the unlimited dimension.

The basic unit of named data in a netCDF dataset is a variable. When a variable is defined, its shape is specified as a list of dimensions. These dimensions must already exist. The number of dimensions is called the rank (a.k.a. dimensionality). A scalar variable has rank 0, a vector has rank 1 and a matrix has rank 2.

It is possible (since version 3.1 of netCDF) to use the same dimension more than once in specifying a variable shape. For example, correlation(instrument, instrument) could be a matrix giving correlations between measurements using different instruments. But data whose dimensions correspond to those of physical space/time should have a shape comprising different dimensions, even if some of these have the same length.


Next: , Previous: Dimensions, Up: Dataset Components

2.3 Variables

Variables are used to store the bulk of the data in a netCDF dataset. A variable represents an array of values of the same type. A scalar value is treated as a 0-dimensional array. A variable has a name, a data type, and a shape described by its list of dimensions specified when the variable is created. A variable may also have associated attributes, which may be added, deleted or changed after the variable is created.

A variable external data type is one of a small set of netCDF types. In classic and 64-bit offset files, only the original six types are available (byte, character, short, int, float, and double). Variables in netCDF-4 files may also use unsigned short, unsigned int, 64-bit int, unsigned 64-bit int, or string. Or the user may define a type, as an opaque blob of bytes, as an array of variable length arrays, or as a compound type, which acts like a C struct.

For more information on types for the C interface, see Variable Types (The NetCDF C Interface Guide) in The NetCDF C Interface Guide.

For more information on types for the Fortran interface, see Variable Types (The NetCDF Fortran 77 Interface Guide) in The NetCDF Fortran 77 Interface Guide.

In the CDL notation, only classic and 64-bit offset type can be used. They are given the simpler names byte, char, short, int, float, and double. real may be used as a synonym for float in the CDL notation. long is a deprecated synonym for int. For the exact meaning of each of the types see External Types.

CDL variable declarations appear after the variable keyword in a CDL unit. They have the form

          type variable_name ( dim_name_1, dim_name_2, ... );

for variables with dimensions, or

          type variable_name;

for scalar variables.

In the above CDL example there are six variables. As discussed below, four of these are coordinate variables. The remaining variables (sometimes called primary variables), temp and rh, contain what is usually thought of as the data. Each of these variables has the unlimited dimension time as its first dimension, so they are called record variables. A variable that is not a record variable has a fixed length (number of data values) given by the product of its dimension lengths. The length of a record variable is also the product of its dimension lengths, but in this case the product is variable because it involves the length of the unlimited dimension, which can vary. The length of the unlimited dimension is the number of records.

2.3.1 Coordinate Variables

It is legal for a variable to have the same name as a dimension. Such variables have no special meaning to the netCDF library. However there is a convention that such variables should be treated in a special way by software using this library.

A variable with the same name as a dimension is called a coordinate variable. It typically defines a physical coordinate corresponding to that dimension. The above CDL example includes the coordinate variables lat, lon, level and time, defined as follows:

             int     lat(lat), lon(lon), level(level);
             short   time(time);
     ...
     data:
             level   = 1000, 850, 700, 500;
             lat     = 20, 30, 40, 50, 60;
             lon     = -160,-140,-118,-96,-84,-52,-45,-35,-25,-15;
             time    = 12;

These define the latitudes, longitudes, barometric pressures and times corresponding to positions along these dimensions. Thus there is data at altitudes corresponding to 1000, 850, 700 and 500 millibars; and at latitudes 20, 30, 40, 50 and 60 degrees north. Note that each coordinate variable is a vector and has a shape consisting of just the dimension with the same name.

A position along a dimension can be specified using an index. This is an integer with a minimum value of 0 for C programs, 1 in Fortran programs. Thus the 700 millibar level would have an index value of 2 in the example above in a C program, and 3 in a Fortran program.

If a dimension has a corresponding coordinate variable, then this provides an alternative, and often more convenient, means of specifying position along it. Current application packages that make use of coordinate variables commonly assume they are numeric vectors and strictly monotonic (all values are different and either increasing or decreasing).


Next: , Previous: Variables, Up: Dataset Components

2.4 Attributes

NetCDF attributes are used to store data about the data (ancillary data or metadata), similar in many ways to the information stored in data dictionaries and schema in conventional database systems. Most attributes provide information about a specific variable. These are identified by the name (or ID) of that variable, together with the name of the attribute.

Some attributes provide information about the dataset as a whole and are called global attributes. These are identified by the attribute name together with a blank variable name (in CDL) or a special null "global variable" ID (in C or Fortran).

In netCDF-4 file, attributes can also be added at the group level.

An attribute has an associated variable (the null "global variable" for a global or group-level attribute), a name, a data type, a length, and a value. The current version treats all attributes as vectors; scalar values are treated as single-element vectors.

Conventional attribute names should be used where applicable. New names should be as meaningful as possible.

The external type of an attribute is specified when it is created. The types permitted for attributes are the same as the netCDF external data types for variables. Attributes with the same name for different variables should sometimes be of different types. For example, the attribute valid_max specifying the maximum valid data value for a variable of type int should be of type int, whereas the attribute valid_max for a variable of type double should instead be of type double.

Attributes are more dynamic than variables or dimensions; they can be deleted and have their type, length, and values changed after they are created, whereas the netCDF interface provides no way to delete a variable or to change its type or shape.

The CDL notation for defining an attribute is

         variable_name:attribute_name = list_of_values;

for a variable attribute, or

         :attribute_name = list_of_values;

for a global attribute. For a group level attribute (netCDF-4 files only):

         :group_name/subgroup_name/attribute_name = list_of_values;

Groups will be created as needed to store the attributes.

The type and length of each attribute are not explicitly declared in CDL; they are derived from the values assigned to the attribute. All values of an attribute must be of the same type. The notation used for constant values of the various netCDF types is discussed later (see CDL Constants).

In the netCDF example (see Data Model), units is an attribute for the variable lat that has a 13-character array value 'degrees_north'. And valid_range is an attribute for the variable rh that has length 2 and values '0.0' and '1.0'.

One global attribute, called “source”, is defined for the example netCDF dataset. This is a character array intended for documenting the data. Actual netCDF datasets might have more global attributes to document the origin, history, conventions, and other characteristics of the dataset as a whole.

Most generic applications that process netCDF datasets assume standard attribute conventions and it is strongly recommended that these be followed unless there are good reasons for not doing so. For information about units, long_name, valid_min, valid_max, valid_range, scale_factor, add_offset, _FillValue, and other conventional attributes, see Attribute Conventions.

Attributes may be added to a netCDF dataset long after it is first defined, so you don't have to anticipate all potentially useful attributes. However adding new attributes to an existing classic or 64-bit offset format dataset can incur the same expense as copying the dataset. For a more extensive discussion see Structure.


Previous: Attributes, Up: Dataset Components

2.5 Differences between Attributes and Variables

In contrast to variables, which are intended for bulk data, attributes are intended for ancillary data, or information about the data. The total amount of ancillary data associated with a netCDF object, and stored in its attributes, is typically small enough to be memory-resident. However variables are often too large to entirely fit in memory and must be split into sections for processing.

Another difference between attributes and variables is that variables may be multidimensional. Attributes are all either scalars (single-valued) or vectors (a single, fixed dimension).

Variables are created with a name, type, and shape before they are assigned data values, so a variable may exist with no values. The value of an attribute is specified when it is created, unless it is a zero-length attribute.

A variable may have attributes, but an attribute cannot have attributes. Attributes assigned to variables may have the same units as the variable (for example, valid_range) or have no units (for example, scale_factor). If you want to store data that requires units different from those of the associated variable, it is better to use a variable than an attribute. More generally, if data require ancillary data to describe them, are multidimensional, require any of the defined netCDF dimensions to index their values, or require a significant amount of storage, that data should be represented using variables rather than attributes.


Next: , Previous: Dataset Components, Up: Top

3 Data

This chapter discusses the primitive netCDF external data types, the kinds of data access supported by the netCDF interface, and how data structures other than arrays may be implemented in a netCDF dataset.


Next: , Previous: Data, Up: Data

3.1 NetCDF External Data Types

The atomic external types supported by the netCDF interface are:

C name Fortran name storage


NC_BYTE nf_byte 8-bit signed integer


NC_CHAR nf_char 8-bit unsigned integer


NC_SHORT nf_short 16-bit signed integer


NC_USHORT nf_ushort 16-bit unsigned integer *


NC_INT (or NC_LONG) nf_int 32-bit signed integer


NC_UINT nf_uint 32-bit unsigned integer *


NC_INT64 nf_int64 64-bit signed integer *


NC_UINT64 nf_uint64 64-bit unsigned integer *


NC_FLOAT nf_float 32-bit floating point


NC_DOUBLE nf_double 64-bit floating point


NC_STRING nf_string variable length character string *


NC_BOOL nf_bool (8-bit) Boolean *

* These types are available only for netCDF-4 format files. All the unsigned ints (except NC_CHAR), the 64-bit ints, the string and bool types, are for netCDF-4 files only.

These types were chosen to provide a reasonably wide range of trade-offs between data precision and number of bits required for each value. These external data types are independent from whatever internal data types are supported by a particular machine and language combination.

These types are called "external", because they correspond to the portable external representation for netCDF data. When a program reads external netCDF data into an internal variable, the data is converted, if necessary, into the specified internal type. Similarly, if you write internal data into a netCDF variable, this may cause it to be converted to a different external type, if the external type for the netCDF variable differs from the internal type.

The separation of external and internal types and automatic type conversion have several advantages. You need not be aware of the external type of numeric variables, since automatic conversion to or from any desired numeric type is available. You can use this feature to simplify code, by making it independent of external types, using a sufficiently wide internal type, e.g., double precision, for numeric netCDF data of several different external types. Programs need not be changed to accommodate a change to the external type of a variable.

If conversion to or from an external numeric type is necessary, it is handled by the library.

Converting from one numeric type to another may result in an error if the target type is not capable of representing the converted value. For example, an internal short integer type may not be able to hold data stored externally as an integer. When accessing an array of values, a range error is returned if one or more values are out of the range of representable values, but other values are converted properly.

Note that mere loss of precision in type conversion does not return an error. Thus, if you read double precision values into a single-precision floating-point variable, for example, no error results unless the magnitude of the double precision value exceeds the representable range of single-precision floating point numbers on your platform. Similarly, if you read a large integer into a float incapable of representing all the bits of the integer in its mantissa, this loss of precision will not result in an error. If you want to avoid such precision loss, check the external types of the variables you access to make sure you use an internal type that has adequate precision.

The names for the primitive external data types (byte, char, short, ushort, int, uint, int64, uint64, float or real, double, bool, string) are reserved words in CDL, so the names of variables, dimensions, and attributes must not be type names.

It is possible to interpret byte data as either signed (-128 to 127) or unsigned (0 to 255). However, when reading byte data to be converted into other numeric types, it is interpreted as signed.

For the correspondence between netCDF external data types and the data types of a language see Variables.


Next: , Previous: External Types, Up: Data

3.2 Data Structures in Classic and 64-bit Offset Files

The only kind of data structure directly supported by the netCDF classic (and 64-bit offset) abstraction is a collection of named arrays with attached vector attributes. NetCDF is not particularly well-suited for storing linked lists, trees, sparse matrices, ragged arrays or other kinds of data structures requiring pointers.

It is possible to build other kinds of data structures in netCDF classic or 64-bit offset formats, from sets of arrays by adopting various conventions regarding the use of data in one array as pointers into another array. The netCDF library won't provide much help or hindrance with constructing such data structures, but netCDF provides the mechanisms with which such conventions can be designed.

The following netCDF classic example stores a ragged array ragged_mat using an attribute row_index to name an associated index variable giving the index of the start of each row. In this example, the first row contains 12 elements, the second row contains 7 elements (19 - 12), and so on. (NetCDF-4 includes native support for variable length arrays. See below.)

             float   ragged_mat(max_elements);
                     ragged_mat:row_index = "row_start";
             int     row_start(max_rows);
     data:
             row_start   = 0, 12, 19, ...

As another example, netCDF variables may be grouped within a netCDF classic or 64-bit offset dataset by defining attributes that list the names of the variables in each group, separated by a conventional delimiter such as a space or comma. Using a naming convention for attribute names for such groupings permits any number of named groups of variables. A particular conventional attribute for each variable might list the names of the groups of which it is a member. Use of attributes, or variables that refer to other attributes or variables, provides a flexible mechanism for representing some kinds of complex structures in netCDF datasets.


Next: , Previous: Classic Data Structures, Up: Data

3.3 NetCDF-4 User Defined Data Types

NetCDF supported six data types through version 3.6.0 (char, byte, short, int, float, and double). Starting with version 4.0, many new data types are supported (unsigned int types, strings, compound types, variable length arrays, enums, opaque).

In addition to the new atomic types, with netCDF-4/HDF5 files, the user may define types.

Types are defined in define mode, and must be fully defined before they are used. New types may be added to a file by re-entering define mode.

Once defined the type may be used to create a variable or attribute.

Types may be nested in complex ways. For example, a compound type containing an array of VLEN types, each containing variable length arrays of some other compound type, etc. Users are cautioned to keep types simple. Reading data of complex types can be challenging for Fortran users.

Types may be defined in any group in the data file, but they are always available globally in the file.

Types cannot have attributes (but variables of the type may have attributes).

User defined data types are not available in the netCDF classic model, so can't be used with classic or 64-bit format files, or netCDF-4 files created with the NC_CLASSIC_MODEL mode flag.

3.3.1 Compound Types

Compound types allow the user to combine atomic and user-defined types into C-like structs. Since users defined types may be used within a compound type, they can contain nested compound types.

Users define a compound type, and (in their C code) a corresponding C struct. They can then use the nc_put_var[1asm] calls to write multi-dimensional arrays of these structs, and nc_get_var[1asm] calls to read them. (For example, the nc_put_varm function will write mapped arrays of these structs.)

While structs, in general, are not portable from platform to platform, the HDF5 layer (when installed) performs the magic required to figure out your platform's idiosyncrasies, and adjust to them. The end result is that HDF5 compound types (and therefore, netCDF-4 compound types), are portable.

For more information on creating and using compound types, see Compound Types (The NetCDF C Interface Guide) in The NetCDF C Interface Guide.

3.3.2 VLEN Types

Variable length arrays can be used to create a ragged array of data, in which one of the dimensions varies in size from point to point.

An example of VLEN use would the to store a 1-D array of dropsonde data, in which the data at each drop point is of variable length.

There is no special restriction on the dimensionality of VLEN variables. It's possible to have 2D, 3D, 4D, etc. data, in which each point contains a VLEN.

A VLEN has a base type (that is, the type that it is a VLEN of). This may be one of the atomic types (forming, for example, a variable length array of NC_INT), or it can be another user defined type, like a compound type.

With VLEN data, special memory allocation and deallocation procedures must be followed, or memory leaks may occur.

For more information on creating and using variable length arrays, see Variable Length Arrays (The NetCDF C Interface Guide) in The NetCDF C Interface Guide.

3.3.3 Opaque Types

Opaque types allow the user to store arrays of data blobs of a fixed size.

For more information on creating and using opaque types, see Opaque Type (The NetCDF C Interface Guide) in The NetCDF C Interface Guide.

3.3.4 Enum Types

Enum types allow the user to specify an enumeration.

For more information on creating and using enum types, see Enum Type (The NetCDF C Interface Guide) in The NetCDF C Interface Guide.

3.3.5 Groups

Although not a type of data, groups can help organize data within a dataset. Like a directory structure on a Unix file-system, the grouping feature allows users to organize variables and dimensions into distinct, named, hierarchical areas, called groups. For more information on groups types, see Groups (The NetCDF C Interface Guide) in The NetCDF C Interface Guide.


Next: , Previous: User Defined Types, Up: Data

3.4 Data Access

To access (read or write) netCDF data you specify an open netCDF dataset, a netCDF variable, and information (e.g., indices) identifying elements of the variable. The name of the access function corresponds to the internal type of the data. If the internal type has a different representation from the external type of the variable, a conversion between the internal type and external type will take place when the data is read or written.

Access to data in classic and 64-bit offset format is direct. Access to netCDF-4 data is buffered by the HDF5 layer. In either case you can access a small subset of data from a large dataset efficiently, without first accessing all the data that precedes it.

Reading and writing data by specifying a variable, instead of a position in a file, makes data access independent of how many other variables are in the dataset, making programs immune to data format changes that involve adding more variables to the data.

In the C and FORTRAN interfaces, datasets are not specified by name every time you want to access data, but instead by a small integer called a dataset ID, obtained when the dataset is first created or opened.

Similarly, a variable is not specified by name for every data access either, but by a variable ID, a small integer used to identify each variable in a netCDF dataset.

3.4.1 Forms of Data Access

The netCDF interface supports several forms of direct access to data values in an open netCDF dataset. We describe each of these forms of access in order of increasing generality:

The four types of vector (index vector, count vector, stride vector and index mapping vector) each have one element for each dimension of the variable. Thus, for an n-dimensional variable (rank = n), n-element vectors are needed. If the variable is a scalar (no dimensions), these vectors are ignored.

An array section is a "slab" or contiguous rectangular block that is specified by two vectors. The index vector gives the indices of the element in the corner closest to the origin. The count vector gives the lengths of the edges of the slab along each of the variable's dimensions, in order. The number of values accessed is the product of these edge lengths.

A subsampled array section is similar to an array section, except that an additional stride vector is used to specify sampling. This vector has an element for each dimension giving the length of the strides to be taken along that dimension. For example, a stride of 4 means every fourth value along the corresponding dimension. The total number of values accessed is again the product of the elements of the count vector.

A mapped array section is similar to a subsampled array section except that an additional index mapping vector allows one to specify how data values associated with the netCDF variable are arranged in memory. The offset of each value from the reference location, is given by the sum of the products of each index (of the imaginary internal array which would be used if there were no mapping) by the corresponding element of the index mapping vector. The number of values accessed is the same as for a subsampled array section.

The use of mapped array sections is discussed more fully below, but first we present an example of the more commonly used array-section access.


Next: , Previous: Data Access, Up: Data Access

3.4.2 A C Example of Array-Section Access

Assume that in our earlier example of a netCDF dataset (see Network Common Data Form Language (CDL)), we wish to read a cross-section of all the data for the temp variable at one level (say, the second), and assume that there are currently three records (time values) in the netCDF dataset. Recall that the dimensions are defined as

       lat = 5, lon = 10, level = 4, time = unlimited;

and the variable temp is declared as

       float   temp(time, level, lat, lon);

in the CDL notation.

A corresponding C variable that holds data for only one level might be declared as

     #define LATS  5
     #define LONS 10
     #define LEVELS 1
     #define TIMES 3                 /* currently */
         ...
     float   temp[TIMES*LEVELS*LATS*LONS];
     
     to keep the data in a one-dimensional array, or
     
         ...
     float   temp[TIMES][LEVELS][LATS][LONS];

using a multidimensional array declaration.

To specify the block of data that represents just the second level, all times, all latitudes, and all longitudes, we need to provide a start index and some edge lengths. The start index should be (0, 1, 0, 0) in C, because we want to start at the beginning of each of the time, lon, and lat dimensions, but we want to begin at the second value of the level dimension. The edge lengths should be (3, 1, 5, 10) in C, (since we want to get data for all three time values, only one level value, all five lat values, and all 10 lon values. We should expect to get a total of 150 floating-point values returned (3 * 1 * 5 * 10), and should provide enough space in our array for this many. The order in which the data will be returned is with the last dimension, lon, varying fastest:

          temp[0][1][0][0]
          temp[0][1][0][1]
          temp[0][1][0][2]
          temp[0][1][0][3]
     
                ...
     
          temp[2][1][4][7]
          temp[2][1][4][8]
          temp[2][1][4][9]

Different dimension orders for the C, FORTRAN, or other language interfaces do not reflect a different order for values stored on the disk, but merely different orders supported by the procedural interfaces to the languages. In general, it does not matter whether a netCDF dataset is written using the C, FORTRAN, or another language interface; netCDF datasets written from any supported language may be read by programs written in other supported languages.

3.4.3 More on General Array Section Access for C

The use of mapped array sections allows non-trivial relationships between the disk addresses of variable elements and the addresses where they are stored in memory. For example, a matrix in memory could be the transpose of that on disk, giving a quite different order of elements. In a regular array section, the mapping between the disk and memory addresses is trivial: the structure of the in-memory values (i.e., the dimensional lengths and their order) is identical to that of the array section. In a mapped array section, however, an index mapping vector is used to define the mapping between indices of netCDF variable elements and their memory addresses.

With mapped array access, the offset (number of array elements) from the origin of a memory-resident array to a particular point is given by the inner product[1] of the index mapping vector with the point's coordinate offset vector. A point's coordinate offset vector gives, for each dimension, the offset from the origin of the containing array to the point.In C, a point's coordinate offset vector is the same as its coordinate vector.

The index mapping vector for a regular array section would have–in order from most rapidly varying dimension to most slowly–a constant 1, the product of that value with the edge length of the most rapidly varying dimension of the array section, then the product of that value with the edge length of the next most rapidly varying dimension, and so on. In a mapped array, however, the correspondence between netCDF variable disk locations and memory locations can be different.

For example, the following C definitions

     struct vel {
         int flags;
         float u;
         float v;
     } vel[NX][NY];
     ptrdiff_t imap[2] = {
         sizeof(struct vel),
         sizeof(struct vel)*NY
     };

where imap is the index mapping vector, can be used to access the memory-resident values of the netCDF variable, vel(NY,NX), even though the dimensions are transposed and the data is contained in a 2-D array of structures rather than a 2-D array of floating-point values.

A detailed example of mapped array access is presented in the description of the interfaces for mapped array access. See Write a Mapped Array of Values - nc_put_varm_ type (The NetCDF C Interface Guide).

Note that, although the netCDF abstraction allows the use of subsampled or mapped array-section access there use is not required. If you do not need these more general forms of access, you may ignore these capabilities and use single value access or regular array section access instead.


Previous: C Section Access, Up: Data Access

3.4.4 A Fortran Example of Array-Section Access

Assume that in our earlier example of a netCDF dataset (see Data Model), we wish to read a cross-section of all the data for the temp variable at one level (say, the second), and assume that there are currently three records (time values) in the netCDF dataset. Recall that the dimensions are defined as

       lat = 5, lon = 10, level = 4, time = unlimited;

and the variable temp is declared as

       float   temp(time, level, lat, lon);

in the CDL notation.

In FORTRAN, the dimensions are reversed from the CDL declaration with the first dimension varying fastest and the record dimension as the last dimension of a record variable. Thus a FORTRAN declarations for a variable that holds data for only one level is

     INTEGER LATS, LONS, LEVELS, TIMES
     PARAMETER (LATS=5, LONS=10, LEVELS=1, TIMES=3)
        ...
     REAL TEMP(LONS, LATS, LEVELS, TIMES)

To specify the block of data that represents just the second level, all times, all latitudes, and all longitudes, we need to provide a start index and some edge lengths. The start index should be (1, 1, 2, 1) in FORTRAN, because we want to start at the beginning of each of the time, lon, and lat dimensions, but we want to begin at the second value of the level dimension. The edge lengths should be (10, 5, 1, 3) in FORTRAN, since we want to get data for all three time values, only one level value, all five lat values, and all 10 lon values. We should expect to get a total of 150 floating-point values returned (3 * 1 * 5 * 10), and should provide enough space in our array for this many. The order in which the data will be returned is with the first dimension, LON, varying fastest:

          TEMP( 1, 1, 2, 1)
          TEMP( 2, 1, 2, 1)
          TEMP( 3, 1, 2, 1)
          TEMP( 4, 1, 2, 1)
     
                ...
     
          TEMP( 8, 5, 2, 3)
          TEMP( 9, 5, 2, 3)
          TEMP(10, 5, 2, 3)

Different dimension orders for the C, FORTRAN, or other language interfaces do not reflect a different order for values stored on the disk, but merely different orders supported by the procedural interfaces to the languages. In general, it does not matter whether a netCDF dataset is written using the C, FORTRAN, or another language interface; netCDF datasets written from any supported language may be read by programs written in other supported languages.

3.4.5 More on General Array Section Access for Fortran

The use of mapped array sections allows non-trivial relationships between the disk addresses of variable elements and the addresses where they are stored in memory. For example, a matrix in memory could be the transpose of that on disk, giving a quite different order of elements. In a regular array section, the mapping between the disk and memory addresses is trivial: the structure of the in-memory values (i.e., the dimensional lengths and their order) is identical to that of the array section. In a mapped array section, however, an index mapping vector is used to define the mapping between indices of netCDF variable elements and their memory addresses.

With mapped array access, the offset (number of array elements) from the origin of a memory-resident array to a particular point is given by the inner product[1] of the index mapping vector with the point's coordinate offset vector. A point's coordinate offset vector gives, for each dimension, the offset from the origin of the containing array to the point. In FORTRAN, the values of a point's coordinate offset vector are one less than the corresponding values of the point's coordinate vector, e.g., the array element A(3,5) has coordinate offset vector [2, 4].

The index mapping vector for a regular array section would have–in order from most rapidly varying dimension to most slowly–a constant 1, the product of that value with the edge length of the most rapidly varying dimension of the array section, then the product of that value with the edge length of the next most rapidly varying dimension, and so on. In a mapped array, however, the correspondence between netCDF variable disk locations and memory locations can be different.

A detailed example of mapped array access is presented in the description of the interfaces for mapped array access. See nf_put_varm_ type (The NetCDF Fortran 77 Interface Guide).

Note that, although the netCDF abstraction allows the use of subsampled or mapped array-section access there use is not required. If you do not need these more general forms of access, you may ignore these capabilities and use single value access or regular array section access instead.


Previous: Data Access, Up: Data

3.5 Type Conversion

Each netCDF variable has an external type, specified when the variable is first defined. This external type determines whether the data is intended for text or numeric values, and if numeric, the range and precision of numeric values.

If the netCDF external type for a variable is char, only character data representing text strings can be written to or read from the variable. No automatic conversion of text data to a different representation is supported.

If the type is numeric, however, the netCDF library allows you to access the variable data as a different type and provides automatic conversion between the numeric data in memory and the data in the netCDF variable. For example, if you write a program that deals with all numeric data as double-precision floating point values, you can read netCDF data into double-precision arrays without knowing or caring what the external type of the netCDF variables are. On reading netCDF data, integers of various sizes and single-precision floating-point values will all be converted to double-precision, if you use the data access interface for double-precision values. Of course, you can avoid automatic numeric conversion by using the netCDF interface for a value type that corresponds to the external data type of each netCDF variable, where such value types exist.

The automatic numeric conversions performed by netCDF are easy to understand, because they behave just like assignment of data of one type to a variable of a different type. For example, if you read floating-point netCDF data as integers, the result is truncated towards zero, just as it would be if you assigned a floating-point value to an integer variable. Such truncation is an example of the loss of precision that can occur in numeric conversions.

Converting from one numeric type to another may result in an error if the target type is not capable of representing the converted value. For example, an integer may not be able to hold data stored externally as an IEEE floating-point number. When accessing an array of values, a range error is returned if one or more values are out of the range of representable values, but other values are converted properly.

Note that mere loss of precision in type conversion does not result in an error. For example, if you read double precision values into an integer, no error results unless the magnitude of the double precision value exceeds the representable range of integers on your platform. Similarly, if you read a large integer into a float incapable of representing all the bits of the integer in its mantissa, this loss of precision will not result in an error. If you want to avoid such precision loss, check the external types of the variables you access to make sure you use an internal type that has a compatible precision.

Whether a range error occurs in writing a large floating-point value near the boundary of representable values may be depend on the platform. The largest floating-point value you can write to a netCDF float variable is the largest floating-point number representable on your system that is less than 2 to the 128th power. The largest double precision value you can write to a double variable is the largest double-precision number representable on your system that is less than 2 to the 1024th power.


Next: , Previous: Data, Up: Top

4 File Structure and Performance

This chapter describes the file structure of a netCDF classic or 64-bit offset dataset in enough detail to aid in understanding netCDF performance issues.

NetCDF is a data abstraction for array-oriented data access and a software library that provides a concrete implementation of the interfaces that support that abstraction. The implementation provides a machine-independent format for representing arrays. Although the netCDF file format is hidden below the interfaces, some understanding of the current implementation and associated file structure may help to make clear why some netCDF operations are more expensive than others.

Knowledge of the format is not needed for reading and writing netCDF data or understanding most efficiency issues. Programs that use only the documented interfaces and that make no assumptions about the format will continue to work even if the netCDF format is changed in the future, because any such change will be made below the documented interfaces and will support earlier versions of the netCDF file format.


Next: , Previous: Structure, Up: Structure

4.1 Parts of a NetCDF Classic File

A netCDF classic or 64-bit offset dataset is stored as a single file comprising two parts:

a header, containing all the information about dimensions, attributes, and variables except for the variable data;

a data part, comprising fixed-size data, containing the data for variables that don't have an unlimited dimension; and variable-size data, containing the data for variables that have an unlimited dimension.

Both the header and data parts are represented in a machine-independent form. This form is very similar to XDR (eXternal Data Representation), extended to support efficient storage of arrays of non-byte data.

The header at the beginning of the file contains information about the dimensions, variables, and attributes in the file, including their names, types, and other characteristics. The information about each variable includes the offset to the beginning of the variable's data for fixed-size variables or the relative offset of other variables within a record. The header also contains dimension lengths and information needed to map multidimensional indices for each variable to the appropriate offsets.

By default, this header has little usable extra space; it is only as large as it needs to be for the dimensions, variables, and attributes (including all the attribute values) in the netCDF dataset, with a small amount of extra space from rounding up to the nearest disk block size. This has the advantage that netCDF files are compact, requiring very little overhead to store the ancillary data that makes the datasets self-describing. A disadvantage of this organization is that any operation on a netCDF dataset that requires the header to grow (or, less likely, to shrink), for example adding new dimensions or new variables, requires moving the data by copying it. This expense is incurred when the enddef function is called: nc_enddef in C (see nc_enddef (The NetCDF C Interface Guide)), NF_ENDDEF in Fortran (see NF_ENDDEF (The NetCDF Fortran 77 Interface Guide)), after a previous call to the redef function: nc_redef in C (see nc_redef (The NetCDF C Interface Guide)) or NF_REDEF in Fortran (see NF_REDEF (The NetCDF Fortran 77 Interface Guide)). If you create all necessary dimensions, variables, and attributes before writing data, and avoid later additions and renamings of netCDF components that require more space in the header part of the file, you avoid the cost associated with later changing the header.

Alternatively, you can use an alternative version of the enddef function with two underbar characters instead of one to explicitly reserve extra space in the file header when the file is created: in C nc__enddef (see nc__enddef (The NetCDF C Interface Guide)), in Fortran NF__ENDDEF (see NF__ENDDEF (The NetCDF Fortran 77 Interface Guide)), after a previous call to the redef function. This avoids the expense of moving all the data later by reserving enough extra space in the header to accommodate anticipated changes, such as the addition of new attributes or the extension of existing string attributes to hold longer strings.

When the size of the header is changed, data in the file is moved, and the location of data values in the file changes. If another program is reading the netCDF dataset during redefinition, its view of the file will be based on old, probably incorrect indexes. If netCDF datasets are shared across redefinition, some mechanism external to the netCDF library must be provided that prevents access by readers during redefinition, and causes the readers to call nc_sync/NF_SYNC before any subsequent access.

The fixed-size data part that follows the header contains all the variable data for variables that do not employ an unlimited dimension. The data for each variable is stored contiguously in this part of the file. If there is no unlimited dimension, this is the last part of the netCDF file.

The record-data part that follows the fixed-size data consists of a variable number of fixed-size records, each of which contains data for all the record variables. The record data for e