NetCDF User's Guide for C ---------------------------------------------------------------------------- An Access Interface for Self-Describing, Portable Data Version 3 June 1997 Russ Rew, Glenn Davis, Steve Emmerson, and Harvey Davies Unidata Program Center Copyright © 1997 University Corporation for Atmospheric Research, Boulder, Colorado. Permission is granted to make and distribute verbatim copies of this manual provided that the copyright notice and these paragraphs are preserved on all copies. The software and any accompanying written materials are provided "as is" without warranty of any kind. UCAR expressly disclaims all warranties of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. The Unidata Program Center is managed by the University Corporation for Atmospheric Research and sponsored by the National Science Foundation. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Mention of any commercial company or product in this document does not constitute an endorsement by the Unidata Program Center. Unidata does not authorize any use of information from this publication for advertising or publicity purposes. ---------------------------------------------------------------------------- Foreword ---------------------------------------------------------------------------- Unidata (http://www.unidata.ucar.edu) is a National Science Foundation-sponsored program empowering U.S. universities, through innovative applications of computers and networks, to make the best use of atmospheric and related data for enhancing education and research. For analyzing and displaying such data, the Unidata Program Center offers universities several supported software packages developed by other organizations, including the University of Wisconsin, Purdue University, NASA, and the National Weather Service. Underlying these is a Unidata-developed system for acquiring and managing data in real time, making practical the Unidata principle that each university should acquire and manage its own data holdings as local requirements dictate. It is significant that the Unidata program has no data center--the management of data is a "distributed" function. The Network Common Data Form (netCDF) software described in this guide was originally intended to provide a common data access method for the various Unidata applications. These deal with a variety of data types that encompass single-point observations, time series, regularly-spaced grids, and satellite or radar images. The netCDF software functions as an I/O library, callable from C, FORTRAN, C++, Perl, or other language for which a netCDF library is available. The library stores and retrieves data in self-describing, machine-independent datasets. Each netCDF dataset can contain multidimensional, named variables (with differing types that include integers, reals, characters, bytes, etc.), and each variable may be accompanied by ancillary data, such as units of measure or descriptive text. The interface includes a method for appending data to existing netCDF datasets in prescribed ways, functionality that is not unlike a (fixed length) record structure. However, the netCDF library also allows direct-access storage and retrieval of data by variable name and index and therefore is useful only for disk-resident (or memory-resident) datasets. NetCDF access has been implemented in about half of Unidata's software, so far, and it is planned that such commonality will extend across all Unidata applications in order to: * Facilitate the use of common datasets by distinct applications. * Permit datasets to be transported between or shared by dissimilar computers transparently, i.e., without translation. * Reduce the programming effort usually spent interpreting formats. * Reduce errors arising from misinterpreting data and ancillary data. * Facilitate using output from one application as input to another. * Establish an interface standard which simplifies the inclusion of new software into the Unidata system. A measure of success has been achieved. NetCDF is now in use on computing platforms that range from CRAYs to personal computers and include most UNIX-based workstations. It can be used to create a complex dataset on one computer (say in FORTRAN) and retrieve that same self-describing dataset on another computer (say in C) without intermediate translations--netCDF datasets can be transferred across a network, or they can be accessed remotely using a suitable network file system. Because we believe that the use of netCDF access in non-Unidata software will benefit Unidata's primary constituency--such use may result in more options for analyzing and displaying Unidata information--the netCDF library is distributed without licensing or other significant restrictions, and current versions can be obtained via anonymous FTP. Apparently the software has been well received by a wide range of institutions beyond the atmospheric science community, and a substantial number of public domain and commercial data analysis systems can now accept netCDF datasets as input. Several organizations have adopted netCDF as a data access standard, and there is an effort underway at the National Center for Supercomputer Applications (NCSA, which is associated with the University of Illinois at Urbana-Champaign) to support the netCDF programming interfaces as a means to store and retrieve data in "HDF files," i.e., in the format used by the popular NCSA tools. We have encouraged and cooperated with these efforts. Questions occasionally arise about the level of support provided for the netCDF software. Unidata's formal position, stated in the copyright notice which accompanies the netCDF library, is that the software is provided "as is". In practice, the software is updated from time to time, and Unidata intends to continue making improvements for the foreseeable future. Because Unidata's mission is to serve geoscientists at U.S. universities, problems reported by that community necessarily receive the greatest attention. We hope the reader will find the software useful and will give us feedback on its application as well as suggestions for its improvement. David Fulker Unidata Program Center Director University Corporation for Atmospheric Research Summary ---------------------------------------------------------------------------- The purpose of the Network Common Data Form (netCDF) interface is to allow you to create, access, and share array-oriented data in a form that is self-describing and portable. "Self-describing" means that a dataset includes information defining the data it contains. "Portable" means that the data in a dataset is represented in a form that can be accessed by computers with different ways of storing integers, characters, and floating-point numbers. Using the netCDF interface for creating new datasets makes the data portable. Using the netCDF interface in software for data access, management, analysis, and display can make the software more generally useful. The netCDF software includes C and FORTRAN interfaces for accessing netCDF data. These libraries are available for many common computing platforms. C++ and Perl interfaces for netCDF data access are also available from Unidata. The community of netCDF users has contributed ports of the software to additional platforms and interfaces for other programming languages as well. Source code for netCDF software libraries is freely available to encourage the sharing of both array-oriented data and the software that makes the data useful. This User's Guide presents the netCDF data model, but documents only the C interface. Separate documents are available for the other language interfaces; also see the netCDF World Wide Web site, http://www.unidata.ucar.edu/packages/netcdf/ for links to on-line versions of the C, FORTRAN, C++ and Perl documentation. Reference documentation for UNIX systems, in the form of UNIX 'man' pages for the C and FORTRAN interfaces is also available there. Extensive additional information about netCDF, including pointers to other software that works with netCDF data, is available from the netCDF World Wide Web site. 1 Introduction ---------------------------------------------------------------------------- 1.1 The NetCDF Interface ---------------------------------------------------------------------------- The Network Common Data Form, or netCDF, is an interface to a library of data access functions for storing and retrieving data in the form of arrays. An array is an n-dimensional (where n is 0, 1, 2, ...) rectangular structure containing items which all have the same data type (e.g., 8-bit character, 32-bit integer). A scalar (simple single value) is a 0-dimensional array. NetCDF is an abstraction that supports a view of data as a collection of self-describing, portable objects that can be accessed through a simple interface. Array values may be accessed directly, without knowing details of how the data are stored. Auxiliary information about the data, such as what units are used, may be stored with the data. Generic utilities and application programs can access netCDF datasets and transform, combine, analyze, or display specified fields of the data. The development of such applications may lead to improved accessibility of data and improved reusability of software for array-oriented data management, analysis, and display. The netCDF software implements an abstract data type, which means that all operations to access and manipulate data in a netCDF dataset must use only the set of functions provided by the interface. The representation of the data is hidden from applications that use the interface, so that how the data are stored could be changed without affecting existing programs. The physical representation of netCDF data is designed to be independent of the computer on which the data were written. Unidata supports the netCDF interfaces for C, FORTRAN, C++, and Perl and for various UNIX operating systems. The software is also ported and tested on a few other operating systems, with assistance from users with access to these systems, before each major release. Unidata's netCDF software is freely available via FTP to encourage its widespread use. 1.2 NetCDF Is Not a Database Management System ---------------------------------------------------------------------------- Why not use an existing database management system for storing array-oriented data? Relational database software is not suitable for the kinds of data access supported by the netCDF interface. First, existing database systems that support the relational model do not support multidimensional objects (arrays) as a basic unit of data access. Representing arrays as relations makes some useful kinds of data access awkward and provides little support for the abstractions of multidimensional data and coordinate systems. A quite different data model is needed for array-oriented data to facilitate its retrieval, modification, mathematical manipulation and visualization. Related to this is a second problem with general-purpose database systems: their poor performance on large arrays. Collections of satellite images, scientific model outputs and long-term global weather observations are beyond the capabilities of most database systems to organize and index for efficient retrieval. Finally, general-purpose database systems provide, at significant cost in terms of both resources and access performance, many facilities that are not needed in the analysis, management, and display of array-oriented data. For example, elaborate update facilities, audit trails, report formatting, and mechanisms designed for transaction-processing are unnecessary for most scientific applications. 1.3 File Format ---------------------------------------------------------------------------- To achieve network-transparency (machine-independence), netCDF is implemented in terms of an external representation much like XDR (eXternal Data Representation, see ftp://ds.internic.net/rfc/rfc1832.txt), a standard for describing and encoding data. This representation provides encoding of data into machine-independent sequences of bits. It has been implemented on a wide variety of computers, by assuming only that eight-bit bytes can be encoded and decoded in a consistent way. The IEEE 754 floating-point standard is used for floating-point data representation. The overall structure of netCDF files is described in Chapter 9 "NetCDF File Structure and Performance," page 95. The details of the format are described in Appendix B "File Format Specification," page 115. However, users are discouraged from using the format specification to develop independent low-level software for reading and writing netCDF files, because this could lead to compatibility problems if the format is ever modified. 1.4 What about Performance? ---------------------------------------------------------------------------- One of the goals of netCDF is to support efficient access to small subsets of large datasets. To support this goal, netCDF uses direct access rather than sequential access. This can be much more efficient when the order in which data is read is different from the order in which it was written, or when it must be read in different orders for different applications. The amount of overhead for a portable external representation depends on many factors, including the data type, the type of computer, the granularity of data access, and how well the implementation has been tuned to the computer on which it is run. This overhead is typically small in comparison to the overall resources used by an application. In any case, the overhead of the external representation layer is usually a reasonable price to pay for portable data access. Although efficiency of data access has been an important concern in designing and implementing netCDF, it is still possible to use the netCDF interface to access data in inefficient ways: for example, by requesting a slice of data that requires a single value from each record. Advice on how to use the interface efficiently is provided in Chapter 9 "NetCDF File Structure and Performance," page 95. 1.5 Is NetCDF a Good Archive Format? ---------------------------------------------------------------------------- NetCDF can be used as a general-purpose archive format for storing arrays. Compression of data is possible with netCDF (e.g., using arrays of eight-bit or 16-bit integers to encode low-resolution floating-point numbers instead of arrays of 32-bit numbers), but the current version of netCDF was not designed to achieve optimal compression of data. Hence, using netCDF may require more space than special-purpose archive formats that exploit knowledge of particular characteristics of specific datasets. 1.6 Creating Self-Describing Data conforming to Conventions ---------------------------------------------------------------------------- The mere use of netCDF is not sufficient to make data "self-describing" and meaningful to both humans and machines. The names of variables and dimensions should be meaningful and conform to any relevant conventions. Dimensions should have corresponding coordinate variables where sensible. Attributes play a vital role in providing ancillary information. It is important to use all the relevant standard attributes using the relevant conventions. Section 8.1 "Attribute Conventions," page 81, describes reserved attributes (used by the netCDF library) and attribute conventions for generic application software. A number of groups have defined their own additional conventions and styles for netCDF data. Descriptions of these conventions, as well as examples incorporating them can be accessed from the netCDF Conventions site, http://www.unidata.ucar.edu/packages/netcdf/conventions.html. These conventions should be used where suitable. Additional conventions are often needed for local use. These should be contributed to the above netCDF conventions site if likely to interest other users in similar areas. 1.7 Background and Evolution of the NetCDF Interface ---------------------------------------------------------------------------- The development of the netCDF interface began with a modest goal related to Unidata's needs: to provide a common interface between Unidata applications and real-time meteorological data. Since Unidata software was intended to run on multiple hardware platforms with access from both C and FORTRAN, achieving Unidata's goals had the potential for providing a package that was useful in a broader context. By making the package widely available and collaborating with other organizations with similar needs, we hoped to improve the then current situation in which software for scientific data access was only rarely reused by others in the same discipline and almost never reused between disciplines (Fulker, 1988). Important concepts employed in the netCDF software originated in a paper (Treinish and Gough, 1987) that described data-access software developed at the NASA Goddard National Space Science Data Center (NSSDC). The interface provided by this software was called the Common Data Format (CDF). The NASA CDF was originally developed as a platform-specific FORTRAN library to support an abstraction for storing arrays. The NASA CDF package had been used for many different kinds of data in an extensive collection of applications. It had the virtues of simplicity (only 13 subroutines), independence from storage format, generality, ability to support logical user views of data, and support for generic applications. Unidata held a workshop on CDF in Boulder in August 1987. We proposed exploring the possibility of collaborating with NASA to extend the CDF FORTRAN interface, to define a C interface, and to permit the access of data aggregates with a single call, while maintaining compatibility with the existing NASA interface. Independently, Dave Raymond at the New Mexico Institute of Mining and Technology had developed a package of C software for UNIX that supported sequential access to self-describing array-oriented data and a "pipes and filters" (or "data flow") approach to processing, analyzing, and displaying the data. This package also used the "Common Data Format" name, later changed to C-Based Analysis and Display System (CANDIS). Unidata learned of Raymond's work (Raymond, 1988), and incorporated some of his ideas, such as the use of named dimensions and variables with differing shapes in a single data object, into the Unidata netCDF interface. In early 1988, Glenn Davis of Unidata developed a prototype netCDF package in C that was layered on XDR. This prototype proved that a single-file, XDR-based implementation of the CDF interface could be achieved at acceptable cost and that the resulting programs could be implemented on both UNIX and VMS systems. However, it also demonstrated that providing a small, portable, and NASA CDF-compatible FORTRAN interface with the desired generality was not practical. NASA's CDF and Unidata's netCDF have since evolved separately, but recent CDF versions share many characteristics with netCDF. In early 1988, Joe Fahle of SeaSpace, Inc. (a commercial software development firm in San Diego, California), a participant in the 1987 Unidata CDF workshop, independently developed a CDF package in C that extended the NASA CDF interface in several important ways (Fahle, 1989). Like Raymond's package, the SeaSpace CDF software permitted variables with unrelated shapes to be included in the same data object and permitted a general form of access to multidimensional arrays. Fahle's implementation was used at SeaSpace as the intermediate form of storage for a variety of steps in their image-processing system. This interface and format have subsequently evolved into the Terascan data format. After studying Fahle's interface, we concluded that it solved many of the problems we had identified in trying to stretch the NASA interface to our purposes. In August 1988, we convened a small workshop to agree on a Unidata netCDF interface, and to resolve remaining open issues. Attending were Joe Fahle of SeaSpace, Michael Gough of Apple (an author of the NASA CDF software), Angel Li of the University of Miami (who had implemented our prototype netCDF software on VMS and was a potential user), and Unidata systems development staff. Consensus was reached at the workshop after some further simplifications were discovered. A document incorporating the results of the workshop into a proposed Unidata netCDF interface specification was distributed widely for comments before Glenn Davis and Russ Rew implemented the first version of the software. Comparison with other data-access interfaces and experience using netCDF are discussed in Rew and Davis (1990a), Rew and Davis (1990b), Jenter and Signell (1992), and Brown, Folk, Goucher, and Rew (1993). In October 1991, we announced version 2.0 of the netCDF software distribution. Slight modifications to the C interface (declaring dimension lengths to be long rather than int) improved the usability of netCDF on inexpensive platforms such as MS-DOS computers, without requiring recompilation on other platforms. This change to the interface required no changes to the associated file format. Release of netCDF version 2.3 in June 1993 preserved the same file format but added single call access to records, optimizations for accessing cross-sections involving non-contiguous data, subsampling along specified dimensions (using 'strides'), accessing non-contiguous data (using 'mapped array sections'), improvements to the ncdump and ncgen utilities, and an experimental C++ interface. In version 2.4, released in February 1996, support was added for new platforms and for the C++ interface, and significant optimizations were implemented for supercomputer architectures. FAN (File Array Notation), software providing a high-level interface to netCDF data, was made available in May 1996. The capabilities of the FAN utilities include extracting and manipulating array data from netCDF datasets, printing selected data from netCDF arrays, copying ASCII data into netCDF arrays, and performing various operations (sum, mean, max, min, product,...) on netCDF arrays. More information about FAN is available from the FAN Utilities document, http://www.unidata.ucar.edu/packages/netcdf/fan_utils.html. 1.8 What's New Since the Previous Release? ---------------------------------------------------------------------------- This Guide documents the January 1997 release of netCDF 3, which preserves the same file format as earlier versions but includes some major changes from version 2.4: * complete rewrite of the netCDF library in ANSI C; * new type-safe C and FORTRAN interfaces; * automatic type conversion facilities; * significant changes in the internal architecture, resulting in higher performance and easier optimization on new platforms; * support for all netCDF 2 function interfaces, globals variables, and behavior, for backward compatibility; * revised documentation; and fixes for reported bugs. 1.9 Limitations of NetCDF ---------------------------------------------------------------------------- The netCDF data model is widely applicable to data that can be organized into a collection of named array variables with named attributes, but there are some important limitations to the model and its implementation in software. Some of these limitations are inherent in the trade-offs among conflicting requirements that netCDF embodies, but we plan to address other limitations in the next version of the software. Currently, netCDF offers a limited number of external numeric data types: 8-, 16-, 32-bit integers, or 32- or 64-bit floating-point numbers. This limited set of sizes may use file space inefficiently compared to packing data in bit fields. For example, arrays of 9-bit values must be stored in 16-bit short integers. Storing arrays of 1- or 2-bit values in 8-bit values is even less optimal. With the current netCDF file format, no more than 2 gigabytes of data can be stored in a single netCDF dataset. This limitation is a result of 32-bit offsets currently used for storing positions within a file. Another limitation of the current model is that only one unlimited (changeable) dimension is permitted for each netCDF data set. Multiple variables can share an unlimited dimension, but then they must all grow together. Hence the netCDF model does not permit variables with several unlimited dimensions or the use of multiple unlimited dimensions in different variables within the same dataset. Hence variables that have non-rectangular shapes (for example, ragged arrays) cannot be represented conveniently. The extent to which data can be completely self-describing is limited: there is always some assumed context without which sharing and archiving data would be impractical. NetCDF permits storing meaningful names for variables, dimensions, and attributes; units of measure in a form that can be used in computations; text strings for attribute values that apply to an entire data set; and simple kinds of coordinate system information. But for more complex kinds of metadata (for example, the information necessary to provide accurate georeferencing of data on unusual grids or from satellite images), it is often necessary to develop conventions. Specific additions to the netCDF data model might make some of these conventions unnecessary or allow some forms of metadata to be represented in a uniform and compact way. For example, adding explicit georeferencing to the netCDF data model would simplify elaborate georeferencing conventions at the cost of complicating the model. The problem is finding an appropriate trade-off between the richness of the model and its generality (i.e., its ability to encompass many kinds of data). A data model tailored to capture the shared context among researchers within one discipline may not be appropriate for sharing or combining data from multiple disciplines. The netCDF data model does not support nested data structures such as trees, nested arrays, or other recursive structures, primarily because the current FORTRAN interface must be able to read and write any netCDF data set. Through use of indirection and conventions it is possible to represent some kinds of nested structures, but the result may fall short of the netCDF goal of self-describing data. Finally, the current implementation limits concurrent access to a netCDF dataset. One writer and multiple readers may access data in a single dataset simultaneously, but there is no support for multiple concurrent writers. 1.10 Future Plans for NetCDF ---------------------------------------------------------------------------- Currentplans are to add transparent data packing, improved concurrency support, and the ability to access datasets larger than 2 Gigabytes. Other desirable extensions that may be added, if practical, include access to data by key or coordinate value, support for efficient structure changes (e.g., new variables and attributes), support for pointers to data cross-sections in other datasets, nested arrays (allowing representation of ragged arrays, trees and other recursive data structures), and multiple unlimited dimensions. References 1. Brown, S. A, M. Folk, G. Goucher, and R. Rew, "Software for Portable Scientific Data Management," Computers in Physics, American Institute of Physics, Vol. 7, No. 3, May/June 1993. 2. Davies, H. L., "FAN - An array-oriented query language," Second Workshop on Database Issues for Data Visualization (Visualization 1995), Atlanta, Georgia, IEEE, October 1995. 3. Fahle, J., TeraScan Applications Programming Interface, SeaSpace, San Diego, California, 1989. 4. Fulker, D. W., "The netCDF: Self-Describing, Portable Files---a Basis for 'Plug-Compatible' Software Modules Connectable by Networks," ICSU Workshop on Geophysical Informatics, Moscow, USSR, August 1988. 5. Fulker, D. W., "Unidata Strawman for Storing Earth-Referencing Data," Seventh International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, New Orleans, La., American Meteorology Society, January 1991. 6. Gough, M. L., NSSDC CDF Implementer's Guide (DEC VAX/VMS) Version 1.1, National Space Science Data Center, 88-17, NASA/Goddard Space Flight Center, 1988. 7. Jenter, H. L. and R. P. Signell, "NetCDF: A Freely-Available Software-Solution to Data-Access Problems for Numerical Modelers," Proceedings of the American Society of Civil Engineers Conference on Estuarine and Coastal Modeling, Tampa, Florida, 1992. 8. Raymond, D. J., "A C Language-Based Modular System for Analyzing and Displaying Gridded Numerical Data," Journal of Atmospheric and Oceanic Technology, 5, 501-511, 1988. 9. Rew, R. K. and G. P. Davis, "The Unidata netCDF: Software for Scientific Data Access," Sixth International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, Anaheim, California, American Meteorology Society, February 1990. 10. Rew, R. K. and G. P. Davis, "NetCDF: An Interface for Scientific Data Access," Computer Graphics and Applications, IEEE, pp. 76-82, July 1990. 11. Rew, R. K. and G. P. Davis, "Unidata's netCDF Interface for Data Access: Status and Plans," Thirteenth International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, Anaheim, California, American Meteorology Society, February 1997. 12. Treinish, L. A. and M. L. Gough, "A Software Package for the Data Independent Management of Multi-Dimensional Data," EOS Transactions, American Geophysical Union, 68, 633-635, 1987. 2 Components of a NetCDF Dataset ---------------------------------------------------------------------------- 2.1 The NetCDF Data Model ---------------------------------------------------------------------------- A netCDF dataset contains dimensions, variables, and attributes, which all have both a name and an ID number by which they are identified. These components can be used together to capture the meaning of data and relations among data fields in an array-oriented dataset. The netCDF library allows simultaneous access to multiple netCDF datasets which are identified by dataset ID numbers, in addition to ordinary file names. A netCDF dataset contains a symbol table for variables containing their name, data type, rank (number of dimensions), dimensions, and starting disk address. Each element is stored at a disk address which is a linear function of the array indices (subscripts) by which it is identified. Hence, these indices need not be stored separately (as in a relational database). This provides a fast and compact storage method. 2.1.1 Naming Conventions The names of dimensions, variables and attributes consist of arbitrary sequences of alphanumeric characters (as well as underscore '_' and hyphen '-'), beginning with a letter or underscore. (However names commencing with underscore are reserved for system use.) Case is significant in netCDF names. 2.1.2 network Common Data Form Language (CDL) We will use a small netCDF example to illustrate the concepts of the netCDF data model. This includes dimensions, variables, and attributes. The notation used to describe this simple netCDF object is called CDL (network Common Data form Language), which provides a convenient way of describing netCDF datasets. The netCDF system includes utilities for producing human-oriented CDL text files from binary netCDF datasets and vice versa. netcdf example_1 { // example of CDL notation for a netCDF dataset dimensions: // dimension names and lengths are declared first lat = 5, lon = 10, level = 4, time = unlimited; variables: // variable types, names, shapes, attributes float temp(time,level,lat,lon); temp:long_name = "temperature"; temp:units = "celsius"; float rh(time,lat,lon); rh:long_name = "relative humidity"; rh:valid_range = 0.0, 1.0; // min and max int lat(lat), lon(lon), level(level); lat:units = "degrees_north"; lon:units = "degrees_east"; level:units = "millibars"; short time(time); time:units = "hours since 1996-1-1"; // global attributes :source = "Fictional Model Output"; data: // optional data assignments level = 1000, 850, 700, 500; lat = 20, 30, 40, 50, 60; lon = -160,-140,-118,-96,-84,-52,-45,-35,-25,-15; time = 12; rh =.5,.2,.4,.2,.3,.2,.4,.5,.6,.7, .1,.3,.1,.1,.1,.1,.5,.7,.8,.8, .1,.2,.2,.2,.2,.5,.7,.8,.9,.9, .1,.2,.3,.3,.3,.3,.7,.8,.9,.9, 0,.1,.2,.4,.4,.4,.4,.7,.9,.9; } The CDL notation for a netCDF dataset can be generated automatically by using ncdump, a utility program described later (see Section 10.5 "ncdump," page 104). Another netCDF utility, ncgen, generates a netCDF dataset (or optionally C or FORTRAN source code containing calls needed to produce a netCDF dataset) from CDL input (see Section 10.4 "ncgen," page 103). The CDL notation is simple and largely self-explanatory. It will be explained more fully as we describe the components of a netCDF dataset. For now, note that CDL statements are terminated by a semicolon. Spaces, tabs, and newlines can be used freely for readability. Comments in CDL follow the characters '//' on any line. A CDL description of a netCDF dataset takes the form netCDF name { dimensions: ... variables: ... data: ... } where the name is used only as a default in constructing file names by the ncgen utility. The CDL description consists of three optional parts, introduced by the keywords dimensions, variables, and data. NetCDF dimension declarations appear after the dimensions keyword, netCDF variables and attributes are defined after the variables keyword, and variable data assignments appear after the data keyword. 2.2 Dimensions ---------------------------------------------------------------------------- A dimension may be used to represent a real physical dimension, for example, time, latitude, longitude, or height. A dimension might also be used to index other quantities, for example station or model-run-number. A netCDF dimension has both a name and a length. A dimension length is an arbitrary positive integer, except that one dimension in a netCDF dataset can have the length UNLIMITED. Such a dimension is called the unlimited dimension or the record dimension. A variable with an unlimited dimension can grow to any length along that dimension. The unlimited dimension index is like a record number in conventional record-oriented files. A netCDF dataset can have at most one unlimited dimension, but need not have any. If a variable has an unlimited dimension, that dimension must be the most significant (slowest changing) one. Thus any unlimited dimension must be the first dimension in a CDL shape and the first dimension in corresponding C array declarations. CDL dimension declarations may appear on one or more lines following the CDL keyword dimensions. Multiple dimension declarations on the same line may be separated by commas. Each declaration is of the form name = length. There are four dimensions in the above example: lat, lon, level, and time. The first three are assigned fixed lengths; time is assigned the length UNLIMITED, which means it is the unlimited dimension. The basic unit of named data in a netCDF dataset is a variable. When a variable is defined, its shape is specified as a list of dimensions. These dimensions must already exist. The number of dimensions is called the rank (a.k.a. dimensionality). A scalar variable has rank 0, a vector has rank 1 and a matrix has rank 2. It is possible to use the same dimension more than once in specifying a variable shape (but this was not possible in previous netCDF versions). For example, correlation(instrument, instrument) could be a matrix giving correlations between measurements using different instruments. But data whose dimensions correspond to those of physical space/time should have a shape comprising different dimensions, even if some of these have the same length. 2.3 Variables ---------------------------------------------------------------------------- Variables are used to store the bulk of the data in a netCDF dataset. A variable represents an array of values of the same type. A scalar value is treated as a 0-dimensional array. A variable has a name, a data type, and a shape described by its list of dimensions specified when the variable is created. A variable may also have associated attributes, which may be added, deleted or changed after the variable is created. A variable external data type is one of a small set of netCDF types that have the names NC_BYTE, NC_CHAR,NC_SHORT, NC_INT,NC_FLOAT, and NC_DOUBLE in the C interface. NC_LONG is a deprecated synonym for NC_INT in the C interface. In the CDL notation, these types are given the simpler names byte, char, short, int, float, and double. real may be used as a synonym for float in the CDL notation. long is a deprecated synonym for int. The exact meaning of each of the types is discussed in Section 3.1 "netCDF external data types," page 15. CDL variable declarations appear after the variable keyword in a CDL unit. They have the form type variable_name ( dim_name_1, dim_name_2, ... ); for variables with dimensions, or type variable_name; for scalar variables. In the above CDL example there are six variables. As discussed below, four of these are coordinate variables. The remaining variables (sometimes called primary variables), temp and rh, contain what is usually thought of as the data. Each of these variables has the unlimited dimension time as its first dimension, so they are called record variables. A variable that is not a record variable has a fixed length (number of data values) given by the product of its dimension lengths. The length of a record variable is also the product of its dimension lengths, but in this case the product is variable because it involves the length of the unlimited dimension, which can vary. The length of the unlimited dimension is the number of records. 2.3.1 Coordinate Variables It is legal for a variable to have the same name as a dimension. Such variables have no special meaning to the netCDF library. However there is a convention that such variables should be treated in a special way by software using this library. A variable with the same name as a dimension is called a coordinate variable. It typically defines a physical coordinate corresponding to that dimension. The above CDL example includes the coordinate variables lat, lon, level and time, defined as follows: int lat(lat), lon(lon), level(level); short time(time); ... data: level = 1000, 850, 700, 500; lat = 20, 30, 40, 50, 60; lon = -160,-140,-118,-96,-84,-52,-45,-35,-25,-15; time = 12; These define the latitudes, longitudes, barometric pressures and times corresponding to positions along these dimensions. Thus there is data at altitudes corresponding to 1000, 850, 700 and 500 millibars; and at latitudes 20, 30, 40, 50 and 60 degrees north. Note that each coordinate variable is a vector and has a shape consisting of just the dimension with the same name. A position along a dimension can be specified using an index. This is an integer with a minimum value of 0 for C programs. Thus the 700 millibar level would have an index value of 2 in the example above. If a dimension has a corresponding coordinate variable, then this provides an alternative, and often more convenient, means of specifying position along it. Current application packages that make use of coordinate variables commonly assume they are numeric vectors and strictly monotonic (all values are different and either increasing or decreasing). 2.4 Attributes ---------------------------------------------------------------------------- NetCDF attributes are used to store data about the data (ancillary data or metadata), similar in many ways to the information stored in data dictionaries and schema in conventional database systems. Most attributes provide information about a specific variable. These are identified by the name (or ID) of that variable, together with the name of the attribute. Some attributes provide information about the dataset as a whole and are called global attributes. These are identified by the attribute name together with a blank variable name (in CDL) or a special null "global variable" ID (in C or Fortran). An attribute has an associated variable (the null "global variable" for a global attribute), a name, a data type, a length, and a value. The current version treats all attributes as vectors; scalar values are treated as single-element vectors. Conventional attribute names should be used where applicable. New names should be as meaningful as possible. The external type of an attribute is specified when it is created. The types permitted for attributes are the same as the netCDF external data types for variables. Attributes with the same name for different variables should sometimes be of different types. For example, the attribute valid_max specifying the maximum valid data value for a variable of type int should be of type int, whereas the attribute valid_max for a variable of type double should instead be of type double. Attributes are more dynamic than variables or dimensions; they can be deleted and have their type, length, and values changed after they are created, whereas the netCDF interface provides no way to delete a variable or to change its type or shape. The CDL notation for defining an attribute is variable_name:attribute_name = list_of_values; for a variable attribute, or :attribute_name = list_of_values; for a global attribute. The type and length of each attribute are not explicitly declared in CDL; they are derived from the values assigned to the attribute. All values of an attribute must be of the same type. The notation used for constant values of the various netCDF types is discussed later (see Section 10.3 "CDL Notation for Data Constants," page 102). In the netCDF example (see Section 2.1.2 "network Common Data Form Language (CDL)," page 9), units is an attribute for the variable lat that has a 13-character array value 'degrees_north'. And valid_range is an attribute for the variable rh that has length 2 and values '0.0' and '1.0'. One global attribute---source---is defined for the example netCDF dataset. This is a character array intended for documenting the data. Actual netCDF datasets might have more global attributes to document the origin, history, conventions, and other characteristics of the dataset as a whole. Most generic applications that process netCDF datasets assume standard attribute conventions and it is strongly recommended that these be followed unless there are good reasons for not doing so. See Section 8.1 "Attribute Conventions," page 81, for information about units, long_name, valid_min, valid_max, valid_range, scale_factor, add_offset, _FillValue, and other conventional attributes. Attributes may be added to a netCDF dataset long after it is first defined, so you don't have to anticipate all potentially useful attributes. However adding new attributes to an existing dataset can incur the same expense as copying the dataset. See Chapter 9 "NetCDF File Structure and Performance," page 95, for a more extensive discussion. 2.5 Differences between Attributes and Variables ---------------------------------------------------------------------------- In contrast to variables, which are intended for bulk data, attributes are intended for ancillary data, or information about the data. The total amount of ancillary data associated with a netCDF object, and stored in its attributes, is typically small enough to be memory-resident. However variables are often too large to entirely fit in memory and must be split into sections for processing. Another difference between attributes and variables is that variables may be multidimensional. Attributes are all either scalars (single-valued) or vectors (a single, fixed dimension). Variables are created with a name, type, and shape before they are assigned data values, so a variable may exist with no values. The value of an attribute must be specified when it is created, so no attribute ever exists without a value. A variable may have attributes, but an attribute cannot have attributes. Attributes assigned to variables may have the same units as the variable (for example, valid_range) or have no units (for example, scale_factor). If you want to store data that requires units different from those of the associated variable, it is better to use a variable than an attribute. More generally, if data require ancillary data to describe them, are multidimensional, require any of the defined netCDF dimensions to index their values, or require a significant amount of storage, that data should be represented using variables rather than attributes. 3 Data ---------------------------------------------------------------------------- This chapter discusses the six primitive netCDF external data types, the kinds of data access supported by the netCDF interface, and how data structures other than arrays may be implemented in a netCDF dataset. 3.1 netCDF external data types ---------------------------------------------------------------------------- The external types supported by the netCDF interface are: char 8-bit characters intended for representing text. byte 8-bit signed or unsigned integers (see discussion below). short 16-bit signed integers. int 32-bit signed integers. float or real 32-bit IEEE floating-point. double 64-bit IEEE floating-point. These types were chosen to provide a reasonably wide range of trade-offs between data precision and number of bits required for each value. These external data types are independent from whatever internal data types are supported by a particular machine and language combination. These types are called "external", because they correspond to the portable external representation for netCDF data. When a program reads external netCDF data into an internal variable, the data is converted, if necessary, into the specified internal type. Similarly, if you write internal data into a netCDF variable, this may cause it to be converted to a different external type, if the external type for the netCDF variable differs from the internal type. The separation of external and internal types and automatic type conversion have several advantages. You need not be aware of the external type of numeric variables, since automatic conversion to or from any desired numeric type is available. You can use this feature to simplify code, by making it independent of external types, using a sufficiently wide internal type, e.g., double precision, for numeric netCDF data of several different external types. Programs need not be changed to accommodate a change to the external type of a variable. If conversion to or from an external numeric type is necessary, it is handled by the library. This automatic conversion and separation of external data representation from internal data types will become even more important in a future version of netCDF, when new external types will be added for packed data for which there may be no natural corresponding internal type, for example, packed arrays of 11-bit values. Converting from one numeric type to another may result in an error if the target type is not capable of representing the converted value. For example, an internal short integer type may not be able to hold data stored externally as an integer. When accessing an array of values, a range error is returned if one or more values are out of the range of representable values, but other values are converted properly. Note that mere loss of precision in type conversion does not return an error. Thus, if you read double precision values into a single-precision floating-point variable, for example, no error results unless the magnitude of the double precision value exceeds the representable range of single-precision floating point numbers on your platform. Similarly, if you read a large integer into a float incapable of representing all the bits of the integer in its mantissa, this loss of precision will not result in an error. If you want to avoid such precision loss, check the external types of the variables you access to make sure you use an internal type that has adequate precision. The names for the primitive external data types (byte, char, short, int, float or real, and double) are reserved words in CDL, so the names of variables, dimensions, and attributes must not be type names. It is possible to interpret byte data as either signed (-128 to 127) or unsigned (0 to 255). However, when reading byte data to be converted into other numeric types, it is interpreted as signed. See Section 2.3 "Variables," page 11, for the correspondence between netCDF external data types and the data types of a language. 3.2 Data Access ---------------------------------------------------------------------------- To access (read or write) netCDF data you specify an open netCDF dataset, a netCDF variable, and information (e.g., indices) identifying elements of the variable. The name of the access function corresponds to the internal type of the data. If the internal type has a different representation from the external type of the variable, a conversion between the internal type and external type will take place when the data is read or written. Access to data is direct, which means you can access a small subset of data from a large dataset efficiently, without first accessing all the data that precedes it. Reading and writing data by specifying a variable, instead of a position in a file, makes data access independent of how many other variables are in the dataset, making programs immune to data format changes that involve adding more variables to the data. In the C and FORTRAN interfaces, datasets are not specified by name every time you want to access data, but instead by a small integer called a dataset ID, obtained when the dataset is first created or opened. Similarly, a variable is not specified by name for every data access either, but by a variable ID, a small integer used to identify each variable in a netCDF dataset. 3.2.1 Forms of Data Access The netCDF interface supports several forms of direct access to data values in an open netCDF dataset. We describe each of these forms of access in order of increasing generality: * access to all elements; * access to individual elements, specified with an index vector; * access to array sections, specified with an index vector, and count vector; * access to subsampled array sections, specified with an index vector, count vector, and stride vector; and * access to mapped array sections, specified with an index vector, count vector, stride vector, and an index mapping vector. The four types of vector (index vector, count vector, stride vector and index mapping vector) each have one element for each dimension of the variable. Thus, for an n-dimensional variable (rank = n), n-element vectors are needed. If the variable is a scalar (no dimensions), these vectors are ignored. An array section is a "slab" or contiguous rectangular block that is specified by two vectors. The index vector gives the indices of the element in the corner closest to the origin. The count vector gives the lengths of the edges of the slab along each of the variable's dimensions, in order. The number of values accessed is the product of these edge lengths. A subsampled array section is similar to an array section, except that an additional stride vector is used to specify sampling. This vector has an element for each dimension giving the length of the strides to be taken along that dimension. For example, a stride of 4 means every fourth value along the corresponding dimension. The total number of values accessed is again the product of the elements of the count vector. A mapped array section is similar to a subsampled array section except that an additional index mapping vector allows one to specify how data values associated with the netCDF variable are arranged in memory. The offset of each value from the reference location, is given by the sum of the products of each index (of the imaginary internal array which would be used if there were no mapping) by the corresponding element of the index mapping vector. The number of values accessed is the same as for a subsampled array section. The use of mapped array sections is discussed more fully below, but first we present an example of the more commonly used array-section access. 3.2.2 An Example of Array-Section Access Assume that in our earlier example of a netCDF dataset (see Section 2.1.2 "network Common Data Form Language (CDL)," page 9), we wish to read a cross-section of all the data for the temp variable at one level (say, the second), and assume that there are currently three records (time values) in the netCDF dataset. Recall that the dimensions are defined as lat = 5, lon = 10, level = 4, time = unlimited; and the variable temp is declared as float temp(time, level, lat, lon); in the CDL notation. A corresponding C variable that holds data for only one level might be declared as #define LATS 5 #define LONS 10 #define LEVELS 1 #define TIMES 3 /* currently */ ... float temp[TIMES*LEVELS*LATS*LONS]; to keep the data in a one-dimensional array, or ... float temp[TIMES][LEVELS][LATS][LONS]; using a multidimensional array declaration. To specify the block of data that represents just the second level, all times, all latitudes, and all longitudes, we need to provide a start index and some edge lengths. The start index should be (0, 1, 0, 0) in C, because we want to start at the beginning of each of the time, lon, and lat dimensions, but we want to begin at the second value of the level dimension. The edge lengths should be (3, 1, 5, 10) in C, (since we want to get data for all three time values, only one level value, all five lat values, and all 10 lon values. We should expect to get a total of 150 floating-point values returned (3 * 1 * 5 * 10), and should provide enough space in our array for this many. The order in which the data will be returned is with the last dimension, lon, varying fastest: temp[0][1][0][0] temp[0][1][0][1] temp[0][1][0][2] temp[0][1][0][3] ... temp[2][1][4][7] temp[2][1][4][8] temp[2][1][4][9] Different dimension orders for the C, FORTRAN, or other language interfaces do not reflect a different order for values stored on the disk, but merely different orders supported by the procedural interfaces to the languages. In general, it does not matter whether a netCDF dataset is written using the C, FORTRAN, or another language interface; netCDF datasets written from any supported language may be read by programs written in other supported languages. 3.2.3 More on General Array Section Access The use of mapped array sections allows non-trivial relationships between the disk addresses of variable elements and the addresses where they are stored in memory. For example, a matrix in memory could be the transpose of that on disk, giving a quite different order of elements. In a regular array section, the mapping between the disk and memory addresses is trivial: the structure of the in-memory values (i.e., the dimensional lengths and their order) is identical to that of the array section. In a mapped array section, however, an index mapping vector is used to define the mapping between indices of netCDF variable elements and their memory addresses. With mapped array access, the offset (number of array elements) from the origin of a memory-resident array to a particular point is given by the inner product[1] of the index mapping vector with the point's coordinate offset vector. A point's coordinate offset vector gives, for each dimension, the offset from the origin of the containing array to the point. In C, a point's coordinate offset vector is the same as its coordinate vector. The index mapping vector for a regular array section would have--in order from most rapidly varying dimension to most slowly--a constant 1, the product of that value with the edge length of the most rapidly varying dimension of the array section, then the product of that value with the edge length of the next most rapidly varying dimension, and so on. In a mapped array, however, the correspondence between netCDF variable disk locations and memory locations can be different. For example, the following C definitions struct vel { int flags; float u; float v; } vel[NX][NY]; ptrdiff_t imap[2] = { sizeof(struct vel), sizeof(struct vel)*NY }; where imap is the index mapping vector, can be used to access the memory-resident values of the netCDF variable, vel(NY,NX), even though the dimensions are transposed and the data is contained in a 2-D array of structures rather than a 2-D array of floating-point values. A detailed example of mapped array access is presented in the description of the interfaces for mapped array access. See Section 7.9 "Write a Mapped Array of Values: nc_put_varm_type," page 62. Note that, although the netCDF abstraction allows the use of subsampled or mapped array-section access there use is not required. If you do not need these more general forms of access, you may ignore these capabilities and use single value access or regular array section access instead. 3.3 Type Conversion ---------------------------------------------------------------------------- Each netCDF variable has an external type, specified when the variable is first defined. This external type determines whether the data is intended for text or numeric values, and if numeric, the range and precision of numeric values. If the netCDF external type for a variable is char, only character data representing text strings can be written to or read from the variable. No automatic conversion of text data to a different representation is supported. If the type is numeric, however, the netCDF library allows you to access the variable data as a different type and provides automatic conversion between the numeric data in memory and the data in the netCDF variable. For example, if you write a program that deals with all numeric data as double-precision floating point values, you can read netCDF data into double-precision arrays without knowing or caring what the external type of the netCDF variables are. On reading netCDF data, integers of various sizes and single-precision floating-point values will all be converted to double-precision, if you use the data access interface for double-precision values. Of course, you can avoid automatic numeric conversion by using the netCDF interface for a value type that corresponds to the external data type of each netCDF variable, where such value types exist. The automatic numeric conversions performed by netCDF are easy to understand, because they behave just like assignment of data of one type to a variable of a different type. For example, if you read floating-point netCDF data as integers, the result is truncated towards zero, just as it would be if you assigned a floating-point value to an integer variable. Such truncation is an example of the loss of precision that can occur in numeric conversions. Converting from one numeric type to another may result in an error if the target type is not capable of representing the converted value. For example, an integer may not be able to hold data stored externally as an IEEE floating-point number. When accessing an array of values, a range error is returned if one or more values are out of the range of representable values, but other values are converted properly. Note that mere loss of precision in type conversion does not result in an error. For example, if you read double precision values into an integer, no error results unless the magnitude of the double precision value exceeds the representable range of integers on your platform. Similarly, if you read a large integer into a float incapable of representing all the bits of the integer in its mantissa, this loss of precision will not result in an error. If you want to avoid such precision loss, check the external types of the variables you access to make sure you use an internal type that has a compatible precision. Whether a range error occurs in writing a large floating-point value near the boundary of representable values may be depend on the platform. The largest floating-point value you can write to a netCDF float variable is the largest floating-point number representable on your system that is less than 2 to the 128th power. The largest double precision value you can write to a double variable is the largest double-precision number representable on your system that is less than 2 to the 1024th power. This automatic conversion and separation of external data representation from internal data types will become even more important in a future version of netCDF, when new external types will be added for packed data for which there is no natural corresponding internal type, for example, arrays of 11-bit values. 3.4 Data Structures ---------------------------------------------------------------------------- The only kind of data structure directly supported by the netCDF abstraction is a collection of named arrays with attached vector attributes. NetCDF is not particularly well-suited for storing linked lists, trees, sparse matrices, ragged arrays or other kinds of data structures requiring pointers. It is possible to build other kinds of data structures from sets of arrays by adopting various conventions regarding the use of data in one array as pointers into another array. The netCDF library won't provide much help or hindrance with constructing such data structures, but netCDF provides the mechanisms with which such conventions can be designed. The following example stores a ragged array ragged_mat using an attribute row_index to name an associated index variable giving the index of the start of each row. In this example, the first row contains 12 elements, the second row contains 7 elements (19 - 12), and so on. float ragged_mat(max_elements); ragged_mat:row_index = "row_start"; int row_start(max_rows); data: row_start = 0, 12, 19, ... As another example, netCDF variables may be grouped within a netCDF dataset by defining attributes that list the names of the variables in each group, separated by a conventional delimiter such as a space or comma. Using a naming convention for attribute names for such groupings permits any number of named groups of variables. A particular conventional attribute for each variable might list the names of the groups of which it is a member. Use of attributes, or variables that refer to other attributes or variables, provides a flexible mechanism for representing some kinds of complex structures in netCDF datasets. 4 Use of the NetCDF Library ---------------------------------------------------------------------------- You can use the netCDF library without knowing about all of the netCDF interface. If you are creating a netCDF dataset, only a handful of routines are required to define the necessary dimensions, variables, and attributes, and to write the data to the netCDF dataset. (Even less are needed if you use the ncgen utility to create the dataset before running a program using netCDF library calls to write data.) Similarly, if you are writing software to access data stored in a particular netCDF object, only a small subset of the netCDF library is required to open the netCDF dataset and access the data. Authors of generic applications that access arbitrary netCDF datasets need to be familiar with more of the netCDF library. In this chapter we provide templates of common sequences of netCDF calls needed for common uses. For clarity we present only the names of routines; omit declarations and error checking; omit the type-specific suffixes of routine names for variables and attributes; indent statements that are typically invoked multiple times; and use ... to represent arbitrary sequences of other statements. Full parameter lists are described in later chapters. 4.1 Creating a NetCDF Dataset ---------------------------------------------------------------------------- Here is a typical sequence of netCDF calls used to create a new netCDF dataset: nc_create /* create netCDF dataset: enter define mode */ ... nc_def_dim /* define dimensions: from name and length */ ... nc_def_var /* define variables: from name, type, ... */ ... nc_put_att /* put attribute: assign attribute values */ ... nc_enddef /* end definitions: leave define mode */ ... nc_put_var /* provide values for variables */ ... nc_close /* close: save new netCDF dataset */ Only one call is needed to create a netCDF dataset, at which point you will be in the first of two netCDF modes. When accessing an open netCDF dataset, it is either in define mode or data mode. In define mode, you can create dimensions, variables, and new attributes, but you cannot read or write variable data. In data mode, you can access data and change existing attributes, but you are not permitted to create new dimensions, variables, or attributes. One call to nc_def_dim is needed for each dimension created. Similarly, one call to nc_def_var is needed for each variable creation, and one call to a member of the nc_put_att family is needed for each attribute defined and assigned a value. To leave define mode and enter data mode, call nc_enddef. Once in data mode, you can add new data to variables, change old values, and change values of existing attributes (so long as the attribute changes do not require more storage space). Single values may be written to a netCDF variable with one of the members of the nc_put_var1 family, depending on what type of data you have to write. All the values of a variable may be written at once with one of the members of the nc_put_var family. Arrays or array cross-sections of a variable may be written using members of the nc_put_vara family. Subsampled array sections may be written using members of the nc_put_vars family. Mapped array sections may be written using members of the nc_put_varm family. (Subsampled and mapped access are general forms of data access that are explained later.) Finally, you should explicitly close all netCDF datasets that have been opened for writing by calling nc_close. By default, access to the file system is buffered by the netCDF library. If a program terminates abnormally with netCDF datasets open for writing, your most recent modifications may be lost. This default buffering of data is disabled by setting the NC_SHARE flag when opening the dataset. But even if this flag is set, changes to attribute values or changes made in define mode are not written out until nc_sync or nc_close is called. 4.2 Reading a NetCDF Dataset with Known Names ---------------------------------------------------------------------------- Here we consider the case where you know the names of not only the netCDF datasets, but also the names of their dimensions, variables, and attributes. (Otherwise you would have to do "inquire" calls.) The order of typical C calls to read data from those variables in a netCDF dataset is: nc_open /* open existing netCDF dataset */ ... nc_inq_dimid /* get dimension IDs */ ... nc_inq_varid /* get variable IDs */ ... nc_get_att /* get attribute values */ ... nc_get_var /* get values of variables */ ... nc_close /* close netCDF dataset */ First, a single call opens the netCDF dataset, given the dataset name, and returns a netCDF ID that is used to refer to the open netCDF dataset in all subsequent calls. Next, a call to nc_inq_dimid for each dimension of interest gets the dimension ID from the dimension name. Similarly, each required variable ID is determined from its name by a call to nc_inq_varid Once variable IDs are known, variable attribute values can be retrieved using the netCDF ID, the variable ID, and the desired attribute name as input to a member of the nc_get_att family (typically nc_get_att_text or nc_get_att_double) for each desired attribute. Variable data values can be directly accessed from the netCDF dataset with calls to members of the nc_get_var1 family for single values, the nc_get_var family for entire variables, or various other members of the nc_get_vara, nc_get_vars, or nc_get_varm families for array, subsampled or mapped access. Finally, the netCDF dataset is closed with nc_close. There is no need to close a dataset open only for reading. 4.3 Reading a netCDF Dataset with Unknown Names ---------------------------------------------------------------------------- It is possible to write programs (e.g., generic software) which do such things as processing every variable, without needing to know in advance the names of these variables. Similarly, the names of dimensions and attributes may be unknown. Names and other information about netCDF objects may be obtained from netCDF datasets by calling inquire functions. These return information about a whole netCDF dataset, a dimension, a variable, or an attribute. The following template illustrates how they are used: nc_open /* open existing netCDF dataset */ ... nc_inq /* find out what is in it */ ... nc_inq_dim /* get dimension names, lengths */ ... nc_inq_var /* get variable names, types, shapes */ ... nc_inq_attname /* get attribute names */ ... nc_inq_att /* get attribute types and lengths */ ... nc_get_att /* get attribute values */ ... nc_get_var /* get values of variables */ ... nc_close /* close netCDF dataset */ As in the previous example, a single call opens the existing netCDF dataset, returning a netCDF ID. This netCDF ID is given to the nc_inq routine, which returns the number of dimensions, the number of variables, the number of global attributes, and the ID of the unlimited dimension, if there is one. All the inquire functions are inexpensive to use and require no I/O, since the information they provide is stored in memory when a netCDF dataset is first opened. Dimension IDs use consecutive integers, beginning at 0. Also dimensions, once created, cannot be deleted. Therefore, knowing the number of dimension IDs in a netCDF dataset means knowing all the dimension IDs: they are the integers 0, 1, 2, ...up to the number of dimensions. For each dimension ID, a call to the inquire function nc_inq_dim returns the dimension name and length. Variable IDs are also assigned from consecutive integers 0, 1, 2, ... up to the number of variables. These can be used in nc_inq_var calls to find out the names, types, shapes, and the number of attributes assigned to each variable. Once the number of attributes for a variable is known, successive calls to nc_inq_attname return the name for each attribute given the netCDF ID, variable ID, and attribute number. Armed with the attribute name, a call to nc_inq_att returns its type and length. Given the type and length, you can allocate enough space to hold the attribute values. Then a call to a member of the nc_get_att family returns the attribute values. Once the IDs and shapes of netCDF variables are known, data values can be accessed by calling a member of the nc_get_var1 family for single values, or members of the nc_get_var, nc_get_vara, nc_get_vars, or nc_get_varm for various kinds of array access. 4.4 Adding New Dimensions, Variables, Attributes ---------------------------------------------------------------------------- An existing netCDF dataset can be extensively altered. New dimensions, variables, and attributes can be added or existing ones renamed, and existing attributes can be deleted. Existing dimensions, variables, and attributes can be renamed. The following code template lists a typical sequence of calls to add new netCDF components to an existing dataset: nc_open /* open existing netCDF dataset */ ... nc_redef /* put it into define mode */ ... nc_def_dim /* define additional dimensions (if any) */ ... nc_def_var /* define additional variables (if any) */ ... nc_put_att /* define additional attributes (if any) */ ... nc_enddef /* check definitions, leave define mode */ ... nc_put_var /* provide values for new variables */ ... nc_close /* close netCDF dataset */ A netCDF dataset is first opened by the nc_open call. This call puts the open dataset in data mode, which means existing data values can be accessed and changed, existing attributes can be changed (so long as they do not grow), but nothing can be added. To add new netCDF dimensions, variables, or attributes you must enter define mode, by calling nc_redef. In define mode, call nc_def_dim to define new dimensions, nc_def_var to define new variables, and a member of the nc_put_attfamily to assign new attributes to variables or enlarge old attributes. You can leave define mode and reenter data mode, checking all the new definitions for consistency and committing the changes to disk, by calling nc_enddef. If you do not wish to reenter data mode, just call nc_close, which will have the effect of first calling nc_enddef. Until the nc_enddef call, you may back out of all the redefinitions made in define mode and restore the previous state of the netCDF dataset by calling nc_abort. You may also use the nc_abort call to restore the netCDF dataset to a consistent state if the call to nc_enddef fails. If you have called nc_close from definition mode and the implied call to nc_enddef fails, nc_abort will automatically be called to close the netCDF dataset and leave it in its previous consistent state (before you entered define mode). At most one process should have a netCDF dataset open for writing at one time. The library is designed to provide limited support for multiple concurrent readers with one writer, via disciplined use of the nc_sync function and the NC_SHARE flag. If a writer makes changes in define mode, such as the addition of new variables, dimensions, or attributes, some means external to the library is necessary to prevent readers from making concurrent accesses and to inform readers to call nc_sync before the next access. 4.5 Error Handling ---------------------------------------------------------------------------- The netCDF library provides the facilities needed to handle errors in a flexible way. Each netCDF function returns an integer status value. If the returned status value indicates an error, you may handle it in any way desired, from printing an associated error message and exiting to ignoring the error indication and proceeding (not recommended!). For simplicity, the examples in this guide check the error status and call a separate function to handle any errors. The nc_strerror function is available to convert a returned integer error status into an error message string. Occasionally, low-level I/O errors may occur in a layer below the netCDF library. For example, if a write operation causes you to exceed disk quotas or to attempt to write to a device that is no longer available, you may get an error from a layer below the netCDF library, but the resulting write error will still be reflected in the returned status value. 4.6 Compiling and Linking with the NetCDF Library ---------------------------------------------------------------------------- Details of how to compile and link a program that uses the netCDF C or FORTRAN interfaces differ, depending on the operating system, the available compilers, and where the netCDF library and include files are installed. Nevertheless, we provide here examples of how to compile and link a program that uses the netCDF library on a Unix platform, so that you can adjust these examples to fit your installation. Every C file that references netCDF functions or constants must contain an appropriate #include statement before the first such reference: #include Unless the netcdf.h file is installed in a standard directory where the C compiler always looks, you must use the -I option when invoking the compiler, to specify a directory where netcdf.h is installed, for example: cc -c -I/usr/local/netcdf/include myprogram.c Alternatively, you could specify an absolute path name in the #include statement, but then your program would not compile on another platform where netCDF is installed in a different location. Unless the netCDF library is installed in a standard directory where the linker always looks, you must use the -L and -l options to link an object file that uses the netCDF library. For example: cc -o myprogram myprogram.o -L/usr/local/netcdf/lib -lnetcdf Alternatively, you could specify an absolute path name for the library: cc -o myprogram myprogram.o -l/usr/local/netcdf/lib/libnetcdf.a 5 Datasets ---------------------------------------------------------------------------- This chapter presents the interfaces of the netCDF functions that deal with a netCDF dataset or the whole netCDF library. A netCDF dataset that has not yet been opened can only be referred to by its dataset name. Once a netCDF dataset is opened, it is referred to by a netCDF ID, which is a small nonnegative integer returned when you create or open the dataset. A netCDF ID is much like a file descriptor in C or a logical unit number in FORTRAN. In any single program, the netCDF IDs of distinct open netCDF datasets are distinct. A single netCDF dataset may be opened multiple times and will then have multiple distinct netCDF IDs; however at most one of the open instances of a single netCDF dataset should permit writing. When an open netCDF dataset is closed, the ID is no longer associated with a netCDF dataset. Functions that deal with the netCDF library include: * Get version of library. * Get error message corresponding to a returned error code. The operations supported on a netCDF dataset as a single object are: * Create, given dataset name and whether to overwrite or not. * Open for access, given dataset name and read or write intent. * Put into define mode, to add dimensions, variables, or attributes. * Take out of define mode, checking consistency of additions. * Close, writing to disk if required. * Inquire about the number of dimensions, number of variables, number of global attributes, and ID of the unlimited dimension, if any. * Synchronize to disk to make sure it is current. * Set and unset nofill mode for optimized sequential writes. After a summary of conventions used in describing the netCDF interfaces, the rest of this chapter presents a detailed description of the interfaces for these operations. 5.1 NetCDF Library Interface Descriptions ---------------------------------------------------------------------------- Each interface description for a particular netCDF function in this and later chapters contains: * a description of the purpose of the function; * a C function prototype that presents the type and order of the formal parameters to the function; * a description of each formal parameter in the C interface; * a list of possible error conditions; and * an example of a C program fragment calling the netCDF function (and perhaps other netCDF functions). The examples follow a simple convention for error handling, always checking the error status returned from each netCDF function call and calling a handle_error function in case an error was detected. For an example of such a function, see Section 5.2 "Get error message corresponding to error status: nc_strerror," page 30. 5.2 Get error message corresponding to error status: nc_strerror ---------------------------------------------------------------------------- The function nc_strerror returns a static reference to an error message string corresponding to an integer netCDF error status or to a system error number, presumably returned by a previous call to some other netCDF function. The list of netCDF error status codes is available in the appropriate include file for each language binding. Usage const char * nc_strerror(int ncerr); ncerr An error status that might have been returned from a previous call to some netCDF function. Errors If you provide an invalid integer error status that does not correspond to any netCDF error message or or to any system error message (as understood by the system strerror function), nc_strerror returns a string indicating that there is no such error status. Example Here is an example of a simple error handling function that uses nc_strerror to print the error message corresponding to the netCDF error status returned from any netCDF function call and then exit: #include ... void handle_error(int status) { if (status != NC_NOERR) { fprintf(stderr, "%s\n", nc_strerror(status)); exit(-1); } } 5.3 Get netCDF library version: nc_inq_libvers ---------------------------------------------------------------------------- The function nc_inq_libvers returns a string identifying the version of the netCDF library, and when it was built. Usage const char * nc_inq_libvers(void); Errors This function takes no arguments, and thus no errors are possible in its invocation. Example Here is an example using nc_inq_libvers to print the version of the netCDF library with which the program is linked: #include ... printf("%s\n", nc_inq_libvers()); 5.4 Create a NetCDF dataset: nc_create ---------------------------------------------------------------------------- This function creates a new netCDF dataset, returning a netCDF ID that can subsequently be used to refer to the netCDF dataset in other netCDF function calls. The new netCDF dataset opened for write access and placed in define mode, ready for you to add dimensions, variables, and attributes. A creation mode flag specifies whether to overwrite any existing dataset with the same name and whether access to the dataset is shared. Usage int nc_create (const char* path, int cmode, int *ncidp); path The file name of the new netCDF dataset. The creation mode. A zero value (or NC_CLOBBER) specifies the default behavior: overwrite any existing dataset with the same file name and buffer and cache accesses for efficiency. Otherwise, the creation mode is NC_NOCLOBBER, NC_SHARE, or NC_NOCLOBBER|NC_SHARE. Setting the NC_NOCLOBBER flag means you do not want to clobber (overwrite) an existing dataset; an error cmode (NC_EEXIST) is returned if the specified dataset already exists. The NC_SHARE flag is appropriate when one process may be writing the dataset and one or more other processes reading the dataset concurrently; it means that dataset accesses are not buffered and caching is limited. Since the buffering scheme is optimised for sequential access, programs that do not access data sequentially may see some performance improvement by setting the NC_SHARE flag. ncidp Pointer to location where returned netCDF ID is to be stored. Errors nc_create returns the value NC_NOERR if no errors occurred. Possible causes of errors include: * Passing a dataset name that includes a directory that does not exist. * Specifying a dataset name of a file that exists and also specifying NC_NOCLOBBER. * Specifying a meaningless value for the creation mode. * Attempting to create a netCDF dataset in a directory where you don't have permission to create files. Example In this example we create a netCDF dataset named foo.nc; we want the dataset to be created in the current directory only if a dataset with that name does not already exist: #include ... int status; int ncid; ... status = nc_create("foo.nc", NC_NOCLOBBER, &ncid); if (status != NC_NOERR) handle_error(status); 5.5 Open a NetCDF Dataset for Access: nc_open ---------------------------------------------------------------------------- The function nc_open opens an existing netCDF dataset for access. Usage int nc_open (const char *path, int omode, int *ncidp); path File name for netCDF dataset to be opened. A zero value (or NC_NOWRITE) specifies the default behavior: open the dataset with read-only access, buffering and caching accesses for efficiency Otherwise, the creation mode is NC_WRITE, NC_SHARE, or NC_WRITE|NC_SHARE. Setting the NC_WRITE flag opens the dataset with read-write access. ("Writing" means any kind of change to omode the dataset, including appending or changing data, adding or renaming dimensions, variables, and attributes, or deleting attributes.) The NC_SHARE flag is appropriate when one process may be writing the dataset and one or more other processes reading the dataset concurrently; it means that dataset accesses are not buffered and caching is limited. Since the buffering scheme is optimised for sequential access, programs that do not access data sequentially may see some performance improvement by setting the NC_SHARE flag. ncidp Pointer to location where returned netCDF ID is to be stored. Errors nc_open returns the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * The specified netCDF dataset does not exist. * A meaningless mode was specified. Example Here is an example using nc_open to open an existing netCDF dataset named foo.nc for read-only, non-shared access: #include ... int status; int ncid; ... status = nc_open("foo.nc", 0, &ncid); if (status != NC_NOERR) hendle_error(status); 5.6 Put Open NetCDF Dataset into Define Mode: nc_redef ---------------------------------------------------------------------------- The function nc_redef puts an open netCDF dataset into define mode, so dimensions, variables, and attributes can be added or renamed and attributes can be deleted. Usage int nc_redef(int ncid); ncid netCDF ID, from a previous call to nc_open or nc_create. Errors nc_redef returns the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * The specified netCDF dataset is already in define mode. * The specified netCDF dataset was opened for read-only. * The specified netCDF ID does not refer to an open netCDF dataset. Example Here is an example using nc_redef to open an existing netCDF dataset named foo.nc and put it into define mode: #include ... int status; int ncid; ... status = nc_open("foo.nc", NC_WRITE, &ncid); /* open dataset */ if (status != NC_NOERR) handle_error(status); ... status = nc_redef(ncid); /* put in define mode */ if (status != NC_NOERR) handle_error(status); 5.7 Leave Define Mode: nc_enddef ---------------------------------------------------------------------------- The function nc_enddef takes an open netCDF dataset out of define mode. The changes made to the netCDF dataset while it was in define mode are checked and committed to disk if no problems occurred. Non-record variables may be initialized to a "fill value" as well (see Section 5.12 "Set Fill Mode for Writes: nc_set_fill," page 39). The netCDF dataset is then placed in data mode, so variable data can be read or written. This call may involve copying data under some circumstances. See Chapter 9 "NetCDF File Structure and Performance," page 95, for a more extensive discussion. Usage int nc_enddef(int ncid); ncid NetCDF ID, from a previous call to nc_open or nc_create. Errors nc_enddef returns the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * The specified netCDF dataset is not in define mode. * The specified netCDF ID does not refer to an open netCDF dataset. Example Here is an example using nc_enddef to finish the definitions of a new netCDF dataset named foo.nc and put it into data mode: #include ... int status; int ncid; ... status = nc_create("foo.nc", NC_NOCLOBBER, &ncid); if (status != NC_NOERR) handle_error(status); ... /* create dimensions, variables, attributes */ status = nc_enddef(ncid); /*leave define mode*/ if (status != NC_NOERR) handle_error(status); 5.8 Close an Open NetCDF Dataset: nc_close ---------------------------------------------------------------------------- The function nc_close closes an open netCDF dataset. If the dataset is in define mode, nc_enddef will be called before closing. (In this case, if nc_enddef returns an error, nc_abort will automatically be called to restore the dataset to the consistent state before define mode was last entered.) After an open netCDF dataset is closed, its netCDF ID may be reassigned to the next netCDF dataset that is opened or created. Usage int nc_close(int ncid); ncid NetCDF ID, from a previous call to nc_open or nc_create. Errors nc_close returns the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * Define mode was entered and the automatic call made to nc_enddef failed. * The specified netCDF ID does not refer to an open netCDF dataset. Example Here is an example using nc_close to finish the definitions of a new netCDF dataset named foo.nc and release its netCDF ID: #include ... int status; int ncid; ... status = nc_create("foo.nc", NC_NOCLOBBER, &ncid); if (status != NC_NOERR) handle_error(status); ... /* create dimensions, variables, attributes */ status = nc_close(ncid); /* close netCDF dataset */ if (status != NC_NOERR) handle_error(status); 5.9 Inquire about an Open NetCDF Dataset: nc_inq Family ---------------------------------------------------------------------------- Members of the nc_inq family of functions return information about an open netCDF dataset, given its netCDF ID. Dataset inquire functions may be called from either define mode or data mode. The first function, nc_inq, returns values for the number of dimensions, the number of variables, the number of global attributes, and the dimension ID of the dimension defined with unlimited length, if any. The other functions in the family each return just one of these items of information. For C, these functions include nc_inq, nc_inq_ndims, nc_inq_nvars, nc_inq_natts, and nc_inq_unlimdim. No I/O is performed when these functions are called, since the required information is available in memory for each open netCDF dataset. Usage int nc_inq (int ncid, int *ndimsp, int *nvarsp, int *ngattsp, int *unlimdimidp); int nc_inq_ndims (int ncid, int *ndimsp); int nc_inq_nvars (int ncid, int *nvarsp); int nc_inq_natts (int ncid, int *ngattsp); int nc_inq_unlimdim (int ncid, int *unlimdimidp); ncid NetCDF ID, from a previous call to nc_open or nc_create. ndimsp Pointer to location for returned number of dimensions defined for this netCDF dataset. nvarsp Pointer to location for returned number of variables defined for this netCDF dataset. ngattsp Pointer to location for returned number of global attributes defined for this netCDF dataset. Pointer to location for returned ID of the unlimited unlimdimidp dimension, if there is one for this netCDF dataset. If no unlimited length dimension has been defined, -1 is returned. Errors All members of the nc_inq family return the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * The specified netCDF ID does not refer to an open netCDF dataset. Example Here is an example using nc_inq to find out about a netCDF dataset named foo.nc: #include ... int status, ncid, ndims, nvars, ngatts, unlimdimid; ... status = nc_open("foo.nc", NC_NOWRITE, &ncid); if (status != NC_NOERR) handle_error(status); ... status = nc_inq(ncid, &ndims, &nvars, &ngatts, &unlimdimid); if (status != NC_NOERR) handle_error(status); 5.10 Synchronize an Open NetCDF Dataset to Disk: nc_sync ---------------------------------------------------------------------------- The function nc_sync offers a way to synchronize the disk copy of a netCDF dataset with in-memory buffers. There are two reasons you might want to synchronize after writes: * To minimize data loss in case of abnormal termination, or * To make data available to other processes for reading immediately after it is written. But note that a process that already had the dataset open for reading would not see the number of records increase when the writing process calls nc_sync; to accomplish this, the reading process must call nc_sync. This function is backward-compatible with previous versions of the netCDF library. The intent was to allow sharing of a netCDF dataset among multiple readers and one writer, by having the writer call nc_sync after writing and the readers call nc_sync before each read. For a writer, this flushes buffers to disk. For a reader, it makes sure that the next read will be from disk rather than from previously cached buffers, so that the reader will see changes made by the writing process (e.g., the number of records written) without having to close and reopen the dataset. If you are only accessing a small amount of data, it can be expensive in computer resources to always synchronize to disk after every write, since you are giving up the benefits of buffering. An easier way to accomplish sharing (and what is now recommended) is to have the writer and readers open the dataset with the NC_SHARE flag, and then it will not be necessary to call nc_sync at all. However, the nc_sync function still provides finer granularity than the NC_SHARE flag, if only a few netCDF accesses need to be synchronized among processes. It is important to note that changes to the ancillary data, such as attribute values, are not propagated automatically by use of the NC_SHARE flag. Use of the nc_sync function is still required for this purpose. Sharing datasets when the writer enters define mode to change the data schema requires extra care. In previous releases, after the writer left define mode, the readers were left looking at an old copy of the dataset, since the changes were made to a new copy. The only way readers could see the changes was by closing and reopening the dataset. Now the changes are made in place, but readers have no knowledge that their internal tables are now inconsistent with the new dataset schema. If netCDF datasets are shared across redefinition, some mechanism external to the netCDF library must be provided that prevents access by readers during redefinition and causes the readers to call nc_sync before any subsequent access. When calling nc_sync, the netCDF dataset must be in data mode. A netCDF dataset in define mode is synchronized to disk only when nc_enddef is called. A process that is reading a netCDF dataset that another process is writing may call nc_sync to get updated with the changes made to the data by the writing process (e.g., the number of records written), without having to close and reopen the dataset. Data is automatically synchronized to disk when a netCDF dataset is closed, or whenever you leave define mode. Usage int nc_sync(int ncid); ncid NetCDF ID, from a previous call to nc_open or nc_create. Errors nc_sync returns the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * The netCDF dataset is in define mode. * The specified netCDF ID does not refer to an open netCDF dataset. Example Here is an example using nc_sync to synchronize the disk writes of a netCDF dataset named foo.nc: #include ... int status; int ncid; ... status = nc_open("foo.nc", NC_WRITE, &ncid); /* open for writing */ if (status != NC_NOERR) handle_error(status); ... /* write data or change attributes */ status = nc_sync(ncid); /* synchronize to disk */ if (status != NC_NOERR) handle_error(status); 5.11 Back Out of Recent Definitions: nc_abort ---------------------------------------------------------------------------- You no longer need to call this function, since it is called automatically by nc_close in case the dataset is in define mode and something goes wrong with committing the changes. The function nc_abort just closes the netCDF dataset, if not in define mode. If the dataset is being created and is still in define mode, the dataset is deleted. If define mode was entered by a call to nc_redef, the netCDF dataset is restored to its state before definition mode was entered and the dataset is closed. Usage int nc_abort(int ncid); ncid NetCDF ID, from a previous call to nc_open or nc_create. Errors nc_abort returns the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * When called from define mode while creating a netCDF dataset, deletion of the dataset failed. * The specified netCDF ID does not refer to an open netCDF dataset. Example Here is an example using nc_abort to back out of redefinitions of a dataset named foo.nc: #include ... int ncid, status, latid; ... status = nc_open("foo.nc", NC_WRITE, &ncid);/* open for writing */ if (status != NC_NOERR) handle_error(status); ... status = nc_redef(ncid); /* enter define mode */ if (status != NC_NOERR) handle_error(status); ... status = nc_def_dim(ncid, "lat", 18L, &latid); if (status != NC_NOERR) { handle_error(status); status = nc_abort(ncid); /* define failed, abort */ if (status != NC_NOERR) handle_error(status); } 5.12 Set Fill Mode for Writes: nc_set_fill ---------------------------------------------------------------------------- This function is intended for advanced usage, to optimize writes under some circumstances described below. The function nc_set_fill sets the fill mode for a netCDF dataset open for writing and returns the current fill mode in a return parameter. The fill mode can be specified as either NC_FILL or NC_NOFILL. The default behavior corresponding to NC_FILL is that data is pre-filled with fill values, that is fill values are written when you create non-record variables or when you write a value beyond data that has not yet been written. This makes it possible to detect attempts to read data before it was written. See Section 7.16 "Fill Values," page 78, for more information on the use of fill values. See Section 8.1 "Attribute Conventions," page 81, for information about how to define your own fill values. The behavior corresponding to NC_NOFILL overrides the default behavior of prefilling data with fill values. This can be used to enhance performance, because it avoids the duplicate writes that occur when the netCDF library writes fill values that are later overwritten with data. A value indicating which mode the netCDF dataset was already in is returned. You can use this value to temporarily change the fill mode of an open netCDF dataset and then restore it to the previous mode. After you turn on NC_NOFILL mode for an open netCDF dataset, you must be certain to write valid data in all the positions that will later be read. Note that nofill mode is only a transient property of a netCDF dataset open for writing: if you close and reopen the dataset, it will revert to the default behavior. You can also revert to the default behavior by calling nc_set_fill again to explicitly set the fill mode to NC_FILL. There are three situations where it is advantageous to set nofill mode: 1. Creating and initializing a netCDF dataset. In this case, you should set nofill mode before calling nc_enddef and then write completely all non-record variables and the initial records of all the record variables you want to initialize. 2. Extending an existing record-oriented netCDF dataset. Set nofill mode after opening the dataset for writing, then append the additional records to the dataset completely, leaving no intervening unwritten records. 3. Adding new variables that you are going to initialize to an existing netCDF dataset. Set nofill mode before calling nc_enddef then write all the new variables completely. If the netCDF dataset has an unlimited dimension and the last record was written while in nofill mode, then the dataset may be shorter than if nofill mode was not set, but this will be completely transparent if you access the data only through the netCDF interfaces. The use of this feature may not be available (or even needed) in future releases. Programmers are cautioned against heavy reliance upon this feature. Usage int nc_set_fill (int ncid, int fillmode, int *old_modep]; ncid NetCDF ID, from a previous call to nc_open or nc_create. fillmode Desired fill mode for the dataset, either NC_NOFILL or NC_FILL. old_modep Pointer to location for returned current fill mode of the dataset before this call, either NC_NOFILL or NC_FILL. Errors nc_set_fill returns the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * The specified netCDF ID does not refer to an open netCDF dataset. * The specified netCDF ID refers to a dataset open for read-only access. Example The fill mode argument is neither NC_NOFILL nor NC_FILL. Here is an example using nc_set_fill to set nofill mode for subsequent writes of a netCDF dataset named foo.nc: #include ... int ncid, status, old_fill_mode; ... status = nc_open("foo.nc", NC_WRITE, &ncid); /* open for writing */ if (status != NC_NOERR) handle_error(status); ... /* write data with default prefilling behavior */ status = nc_set_fill(ncid, NC_NOFILL, &old_fill_mode); /* set nofill */ if (status != NC_NOERR) handle_error(status); ... /* write data with no prefilling */ 6 Dimensions ---------------------------------------------------------------------------- Dimensions for a netCDF dataset are defined when it is created, while the netCDF dataset is in define mode. Additional dimensions may be added later by reentering define mode. A netCDF dimension has a name and a length. At most one dimension in a netCDF dataset can have the unlimited length, which means variables using this dimension can grow along this dimension. There is a suggested limit (100) to the number of dimensions that can be defined in a single netCDF dataset. The limit is the value of the predefined macro NC_MAX_DIMS. The purpose of the limit is to make writing generic applications simpler. They need only provide an array of NC_MAX_DIMS dimensions to handle any netCDF dataset. The implementation of the netCDF library does not enforce this advisory maximum, so it is possible to use more dimensions, if necessary, but netCDF utilities that assume the advisory maximums may not be able to handle the resulting netCDF datasets. Ordinarily, the name and length of a dimension are fixed when the dimension is first defined. The name may be changed later, but the length of a dimension (other than the unlimited dimension) cannot be changed without copying all the data to a new netCDF dataset with a redefined dimension length. Dimension lengths in the C interface are type size_t rather than type int to make it possible to access all the data in a netCDF dataset on a platform that only supports a 16-bit int data type, for example MSDOS. If dimension lengths were type int instead, it would not be possible to access data from variables with a dimension length greater than a 16-bit int can accommodate. A netCDF dimension in an open netCDF dataset is referred to by a small integer called a dimension ID. In the C interface, dimension IDs are 0, 1, 2, ..., in the order in which the dimensions were defined. Operations supported on dimensions are: * Create a dimension, given its name and length. * Get a dimension ID from its name. * Get a dimension's name and length from its ID. * Rename a dimension. 6.1 Create a Dimension: nc_def_dim ---------------------------------------------------------------------------- The function nc_def_dim adds a new dimension to an open netCDF dataset in define mode. It returns (as an argument) a dimension ID, given the netCDF ID, the dimension name, and the dimension length. At most one unlimited length dimension, called the record dimension, may be defined for each netCDF dataset. Usage int nc_def_dim (int ncid, const char *name, size_t len, int *dimidp); ncid NetCDF ID, from a previous call to nc_open or nc_create. Dimension name. Must begin with an alphabetic character, name followed by zero or more alphanumeric characters including the underscore ('_'). Case is significant. Length of dimension; that is, number of values for this len dimension as an index to variables that use it. This should be either a positive integer (of type size_t) or the predefined constant NC_UNLIMITED. dimidp Pointer to location for returned dimension ID. Errors nc_def_dim returns the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * The netCDF dataset is not in definition mode. * The specified dimension name is the name of another existing dimension. * The specified length is not greater than zero. * The specified length is unlimited, but there is already an unlimited length dimension defined for this netCDF dataset. * The specified netCDF ID does not refer to an open netCDF dataset. Example Here is an example using nc_def_dim to create a dimension named lat of length 18 and a unlimited dimension named rec in a new netCDF dataset named foo.nc: #include ... int status, ncid, latid, recid; ... status = nc_create("foo.nc", NC_NOCLOBBER, &ncid); if (status != NC_NOERR) handle_error(status); ... status = nc_def_dim(ncid, "lat", 18L, &latid); if (status != NC_NOERR) handle_error(status); status = nc_def_dim(ncid, "rec", NC_UNLIMITED, &recid); if (status != NC_NOERR) handle_error(status); 6.2 Get a Dimension ID from Its Name: nc_inq_dimid ---------------------------------------------------------------------------- The function nc_inq_dimid returns (as an argument) the ID of a netCDF dimension, given the name of the dimension. If ndims is the number of dimensions defined for a netCDF dataset, each dimension has an ID between 0 and ndims-1. Usage int nc_inq_dimid (int ncid, const char *name, int *dimidp); ncid NetCDF ID, from a previous call to nc_open or nc_create. Dimension name, a character string beginning with a letter and name followed by any sequence of letters, digits, or underscore ('_') characters. Case is significant in dimension names. dimidp Pointer to location for the returned dimension ID. Errors nc_inq_dimid returns the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * The name that was specified is not the name of a dimension in the netCDF dataset. * The specified netCDF ID does not refer to an open netCDF dataset. Example Here is an example using nc_inq_dimid to determine the dimension ID of a dimension named lat, assumed to have been defined previously in an existing netCDF dataset named foo.nc: #include ... int status, ncid, latid; ... status = nc_open("foo.nc", NC_NOWRITE, &ncid); /* open for reading */ if (status != NC_NOERR) handle_error(status); ... status = nc_inq_dimid(ncid, "lat", &latid); if (status != NC_NOERR) handle_error(status); 6.3 Inquire about a Dimension: nc_inq_dim Family ---------------------------------------------------------------------------- This family of functions returns information about a netCDF dimension. Information about a dimension includes its name and its length. The length for the unlimited dimension, if any, is the number of records written so far. The functions in this family include nc_inq_dim, nc_inq_dimname, and nc_inq_dimlen. The function nc_inq_dim returns all the information about a dimension; the other functions each return just one item of information. Usage int nc_inq_dim (int ncid, int dimid, char* name, size_t* lengthp); int nc_inq_dimname (int ncid, int dimid, char *name); int nc_inq_dimlen (int ncid, int dimid, size_t *lengthp); ncid NetCDF ID, from a previous call to nc_open or nc_create. dimid Dimension ID, from a previous call to nc_inq_dimid or nc_def_dim. Returned dimension name. The caller must allocate space for name the returned name. The maximum possible length, in characters, of a dimension name is given by the predefined constant NC_MAX_NAME. Pointer to location for returned length of dimension. For the lengthp unlimited dimension, this is the number of records written so far. Errors These functions return the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * The dimension ID is invalid for the specified netCDF dataset. * The specified netCDF ID does not refer to an open netCDF dataset. Example Here is an example using nc_inq_dim to determine the length of a dimension named lat, and the name and current maximum length of the unlimited dimension for an existing netCDF dataset named foo.nc: #include ... int status, ncid, latid, recid; size_t latlength, recs; char recname[NC_MAX_NAME]; ... status = nc_open("foo.nc", NC_NOWRITE, &ncid); /* open for reading */ if (status != NC_NOERR) handle_error(status); status = nc_inq_unlimdim(ncid, &recid); /* get ID of unlimited dimension */ if (status != NC_NOERR) handle_error(status); ... status = nc_inq_dimid(ncid, "lat", &latid); /* get ID for lat dimension */ if (status != NC_NOERR) handle_error(status); status = nc_inq_dimlen(ncid, latid, &latlength); /* get lat length */ if (status != NC_NOERR) handle_error(status); /* get unlimited dimension name and current length */ status = nc_inq_dim(ncid, recid, recname, &recs); if (status != NC_NOERR) handle_error(status); 6.4 Rename a Dimension: nc_rename_dim ---------------------------------------------------------------------------- The function nc_rename_dim renames an existing dimension in a netCDF dataset open for writing. If the new name is longer than the old name, the netCDF dataset must be in define mode. You cannot rename a dimension to have the same name as another dimension. Usage int nc_rename_dim(int ncid, int dimid, const char* name); ncid NetCDF ID, from a previous call to nc_open or nc_create. dimid Dimension ID, from a previous call to nc_inq_dimid or nc_def_dim. name New dimension name. Errors nc_rename_dim returns the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * The new name is the name of another dimension. * The dimension ID is invalid for the specified netCDF dataset. * The specified netCDF ID does not refer to an open netCDF dataset. * The new name is longer than the old name and the netCDF dataset is not in define mode. Example Here is an example using nc_rename_dim to rename the dimension lat to latitude in an existing netCDF dataset named foo.nc: #include ... int status, ncid, latid; ... status = nc_open("foo.nc", NC_WRITE, &ncid); /* open for writing */ if (status != NC_NOERR) handle_error(status); ... status = nc_redef(ncid); /* put in define mode to rename dimension */ if (status != NC_NOERR) handle_error(status); status = nc_inq_dimid(ncid, "lat", &latid); if (status != NC_NOERR) handle_error(status); status = nc_rename_dim(ncid, latid, "latitude"); if (status != NC_NOERR) handle_error(status); status = nc_enddef(ncid); /* leave define mode */ if (status != NC_NOERR) handle_error(status); 7 Variables ---------------------------------------------------------------------------- Variables for a netCDF dataset are defined when the dataset is created, while the netCDF dataset is in define mode. Other variables may be added later by reentering define mode. A netCDF variable has a name, a type, and a shape, which are specified when it is defined. A variable may also have values, which are established later in data mode. Ordinarily, the name, type, and shape are fixed when the variable is first defined. The name may be changed, but the type and shape of a variable cannot be changed. However, a variable defined in terms of the unlimited dimension can grow without bound in that dimension. A netCDF variable in an open netCDF dataset is referred to by a small integer called a variable ID. Variable IDs reflect the order in which variables were defined within a netCDF dataset. Variable IDs are 0, 1, 2,..., in the order in which the variables were defined. A function is available for getting the variable ID from the variable name and vice-versa. Attributes (see Chapter 8 "Attributes," page 81) may be associated with a variable to specify such properties as units. Operations supported on variables are: * Create a variable, given its name, data type, and shape. * Get a variable ID from its name. * Get a variable's name, data type, shape, and number of attributes from its ID. * Put a data value into a variable, given variable ID, indices, and value. * Put an array of values into a variable, given variable ID, corner indices, edge lengths, and a block of values. * Put a subsampled or mapped array-section of values into a variable, given variable ID, corner indices, edge lengths, stride vector, index mapping vector, and a block of values. * Get a data value from a variable, given variable ID and indices. * Get an array of values from a variable, given variable ID, corner indices, and edge lengths. * Get a subsampled or mapped array-section of values from a variable, given variable ID, corner indices, edge lengths, stride vector, and index mapping vector. * Rename a variable. 7.1 Language Types Corresponding to netCDF external data types ---------------------------------------------------------------------------- The following table gives the netCDF external data types and the corresponding type constants for defining variables in the C interface: netCDF/CDL Data Type C API Mnemonic Bits byte NC_BYTE 8 char NC_CHAR 8 short NC_SHORT 16 int NC_INT 32 float NC_FLOAT 32 double NC_DOUBLE 64 The first column gives the netCDF external data type, which is the same as the CDL data type. The next column gives the corresponding C preprocessor macro for use in netCDF functions (the preprocessor macros are defined in the netCDF C header-file netcdf.h). The last column gives the number of bits used in the external representation of values of the corresponding type. Note that there are no netCDF types corresponding to 64-bit integers or to characters wider than 8 bits in the current version of the netCDF library. 7.2 Create a Variable: nc_def_var ---------------------------------------------------------------------------- The function nc_def_var adds a new variable to an open netCDF dataset in define mode. It returns (as an argument) a variable ID, given the netCDF ID, the variable name, the variable type, the number of dimensions, and a list of the dimension IDs. Usage int nc_def_var (int ncid, const char *name, nc_type xtype, int ndims, const int dimids[], int *varidp); ncid NetCDF ID, from a previous call to nc_open or nc_create. Variable name. Must begin with an alphabetic character, name followed by zero or more alphanumeric characters including the underscore ('_'). Case is significant. One of the set of predefined netCDF external data types. The xtype type of this parameter, nc_type, is defined in the netCDF header file. The valid netCDF external data types are NC_BYTE, NC_CHAR, NC_SHORT, NC_INT, NC_FLOAT, and NC_DOUBLE. Number of dimensions for the variable. For example, 2 specifies ndims a matrix, 1 specifies a vector, and 0 means the variable is a scalar with no dimensions. Must not be negative or greater than the predefined constant NC_MAX_VAR_DIMS. Vector of ndims dimension IDs corresponding to the variable dimids dimensions. If the ID of the unlimited dimension is included, it must be first. This argument is ignored if ndims is 0. varidp Pointer to location for the returned variable ID. Errors nc_def_var returns the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * The netCDF dataset is not in define mode. * The specified variable name is the name of another existing variable. * The specified type is not a valid netCDF type. * The specified number of dimensions is negative or more than the constant NC_MAX_VAR_DIMS, the maximum number of dimensions permitted for a netCDF variable. * One or more of the dimension IDs in the list of dimensions is not a valid dimension ID for the netCDF dataset. * The number of variables would exceed the constant NC_MAX_VARS, the maximum number of variables permitted in a netCDF dataset. * The specified netCDF ID does not refer to an open netCDF dataset. Example Here is an example using nc_def_var to create a variable named rh of type double with three dimensions, time, lat, and lon in a new netCDF dataset named foo.nc: #include ... int status; /* error status */ int ncid; /* netCDF ID */ int lat_dim, lon_dim, time_dim; /* dimension IDs */ int rh_id; /* variable ID */ int rh_dimids[3]; /* variable shape */ ... status = nc_create("foo.nc", NC_NOCLOBBER, &ncid); if (status != NC_NOERR) handle_error(status); ... /* define dimensions */ status = nc_def_dim(ncid, "lat", 5L, &lat_dim); if (status != NC_NOERR) handle_error(status); status = nc_def_dim(ncid, "lon", 10L, &lon_dim); if (status != NC_NOERR) handle_error(status); status = nc_def_dim(ncid, "time", NC_UNLIMITED, &time_dim); if (status != NC_NOERR) handle_error(status); ... /* define variable */ rh_dimids[0] = time_dim; rh_dimids[1] = lat_dim; rh_dimids[2] = lon_dim; status = nc_def_var (ncid, "rh", NC_DOUBLE, 3, rh_dimids, &rh_id); if (status != NC_NOERR) handle_error(status); 7.3 Get a Variable ID from Its Name: nc_inq_varid ---------------------------------------------------------------------------- The function nc_inq_varid returns the ID of a netCDF variable, given its name. Usage int nc_inq_varid (int ncid, const char *name, int *varidp); ncid NetCDF ID, from a previous call to nc_open or nc_create. name Variable name for which ID is desired. varidp Pointer to location for returned variable ID. Errors nc_inq_varid returns the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * The specified variable name is not a valid name for a variable in the specified netCDF dataset. * The specified netCDF ID does not refer to an open netCDF dataset. Example Here is an example using nc_inq_varid to find out the ID of a variable named rh in an existing netCDF dataset named foo.nc: #include ... int status, ncid, rh_id; ... status = nc_open("foo.nc", NC_NOWRITE, &ncid); if (status != NC_NOERR) handle_error(status); ... status = nc_inq_varid (ncid, "rh", &rh_id); if (status != NC_NOERR) handle_error(status); 7.4 Get Information about a Variable from Its ID: nc_inq_var family ---------------------------------------------------------------------------- A family of functions that returns information about a netCDF variable, given its ID. Information about a variable includes its name, type, number of dimensions, a list of dimension IDs describing the shape of the variable, and the number of variable attributes that have been assigned to the variable. The function nc_inq_var returns all the information about a netCDF variable, given its ID. The other functions each return just one item of information about a variable. These other functions include nc_inq_varname, nc_inq_vartype, nc_inq_varndims, nc_inq_vardimid, and nc_inq_varnatts. Usage int nc_inq_var (int ncid, int varid, char *name, nc_type *xtypep, int *ndimsp, int dimids[], int *nattsp); int nc_inq_varname (int ncid, int varid, char *name); int nc_inq_vartype (int ncid, int varid, nc_type *xtypep); int nc_inq_varndims (int ncid, int varid, int *ndimsp); int nc_inq_vardimid (int ncid, int varid, int dimids[]); int nc_inq_varnatts (int ncid, int varid, int *nattsp); ncid NetCDF ID, from a previous call to nc_open or nc_create. varid Variable ID. Returned variable name. The caller must allocate space for the name returned name. The maximum possible length, in characters, of a variable name is given by the predefined constant NC_MAX_NAME. Pointer to location for returned variable type, one of the set of predefined netCDF external data types. The type of this xtypep parameter, nc_type, is defined in the netCDF header file. The valid netCDF external data types are NC_BYTE, NC_CHAR, NC_SHORT, NC_INT, NC_FLOAT, and NC_DOUBLE. Pointer to location for returned number of dimensions the ndimsp variable was defined as using. For example, 2 indicates a matrix, 1 indicates a vector, and 0 means the variable is a scalar with no dimensions. Returned vector of *ndimsp dimension IDs corresponding to the variable dimensions. The caller must allocate enough space for dimids a vector of at least *ndimsp integers to be returned. The maximum possible number of dimensions for a variable is given by the predefined constant NC_MAX_VAR_DIMS. nattsp Pointer to location for returned number of variable attributes assigned to this variable. Errors These functions return the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * The variable ID is invalid for the specified netCDF dataset. * The specified netCDF ID does not refer to an open netCDF dataset. Example Here is an example using nc_inq_var to find out about a variable named rh in an existing netCDF dataset named foo.nc: #include ... int status /* error status */ int ncid; /* netCDF ID */ int rh_id; /* variable ID */ nc_type rh_type; /* variable type */ int rh_ndims; /* number of dims */ int rh_dims[NC_MAX_VAR_DIMS]; /* variable shape */ int rh_natts /* number of attributes */ ... status = nc_open ("foo.nc", NC_NOWRITE, &ncid); if (status != NC_NOERR) handle_error(status); ... status = nc_inq_varid (ncid, "rh", &rh_id); if (status != NC_NOERR) handle_error(status); /* we don't need name, since we already know it */ status = nc_inq_var (ncid, rh_id, 0, &rh_type, &rh_ndims, rh_dims, &rh_natts); if (status != NC_NOERR) handle_error(status); 7.5 Write a Single Data Value: nc_put_var1_type ---------------------------------------------------------------------------- The functions nc_put_var1_type put a single data value of the specified type into a variable of an open netCDF dataset that is in data mode. Inputs are the netCDF ID, the variable ID, an index that specifies which value to add or alter, and the data value. The value is converted to the external data type of the variable, if necessary. Usage int nc_put_var1_text (int ncid, int varid, const size_t index[], const char *tp); int nc_put_var1_uchar (int ncid, int varid, const size_t index[], const unsigned char *up); int nc_put_var1_schar (int ncid, int varid, const size_t index[], const signed char *cp); int nc_put_var1_short (int ncid, int varid, const size_t index[], const short *sp); int nc_put_var1_int (int ncid, int varid, const size_t index[], const int *ip); int nc_put_var1_long (int ncid, int varid, const size_t index[], const long *lp); int nc_put_var1_float (int ncid, int varid, const size_t index[], const float *fp); int nc_put_var1_double(int ncid, int varid, const size_t index[], const double *dp); ncid NetCDF ID, from a previous call to nc_open or nc_create. varid Variable ID. The index of the data value to be written. The indices are relative to 0, so for example, the first data value of a two-dimensional variable index[] would have index (0,0). The elements of index must correspond to the variable's dimensions. Hence, if the variable uses the unlimited dimension, the first index would correspond to the unlimited dimension. Pointer to the data value to be written. If the tp, up, cp, sp, ip, type of data values differs from the netCDF lp, fp, or dp variable type, type conversion will occur. See Section 3.3 "Type Conversion," page 20, for details. Errors nc_put_var1_type returns the value NC_NOERR if no errors occurred. Otherwise, the returned status indicates an error. Possible causes of errors include: * The variable ID is invalid for the specified netCDF dataset. * The specified indices were out of range for the rank of the specified variable. For example, a negative index or an index that is larger than the corresponding dimension length will cause an error. * The specified value is out of the range of values representable by the external data type of the variable. * The specified netCDF is in define mode rather than data mode. * The specified netCDF ID does not refer to an open netCDF dataset. Example Here is an example using nc_put_var1_double to set the (1,2,3) element of the variable named rh to 0.5 in an existing netCDF dataset named foo.nc. For simplicity in this example, we assume that we know that rh is dimensioned with time, lat, and lon, so we want to set the value of rh that corresponds to the second time value, the third lat value, and the fourth lon value: #include ... int status; /* error status */ int ncid; /* netCDF ID */ int rh_id; /* variable ID */ static size_t rh_index[] = {1, 2, 3}; /* where to put value */ static double rh_val = 0.5; /* value to put */ ... status = nc_open("foo.nc", NC_WRITE, &ncid); if (status !=