Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

Unidata's Common Data Model Version 4

Overview

Unidata’s Common Data Model (CDM) is an abstract data model for scientific datasets. It merges the netCDF, OPeNDAP, and HDF5 data models to create a common API for many types of scientific data. As currently implemented by the NetCDF Java library, it can read (besides OPeNDAP, netCDF, and HDF5) GRIB 1 and 2, BUFR, HDF4, HDF-EOS, NEXRAD, GINI, GEMPAK fomatted files, among others. A pluggable framework allows other developers to add readers for their own specialized formats.

The Common Data Model has three layers, which build on top of each other to add successively richer semantics:

  1. The data access layer, also know as the syntactic layer, handles data reading and writing.
  2. The coordinate system layer identifies the coordinates of the data arrays. Coordinates are a completely general concept for scientific data; we also identify specialized georeferencing coordinate systems, which are important to the Earth Science community.
  3. The scientific feature type layer identifies specific types of data, such as grids, images, and point data, and adds specialized methods for each kind of data.

Data Access Layer Objects

 

A Dataset may be a netCDF, HDF5, GRIB, etc. file, an OPeNDAP dataset, a collection of files, or anything else which can be accessed through the netCDF API. We sometimes use the term CDM dataset to mean any of these possibilities, and to emphasize that a dataset does not have to be a file in netCDF format.

A Group is a logical collection of Variables. The Groups in a Dataset form a hierarchical tree, like directories on a disk. A Group has a name and optionally a set of Attributes, Dimensions, EnumTypedefs, and nested Groups . There is always at least one Group in a Dataset, the root Group, whose name is the empty string.

A Variable is a container for data. It has a DataType, a set of Dimensions that define its array shape, and optionally a set of Attributes. Any shared Dimension it uses must be in the same Group or a parent Group.

A Dimension has a length, and is used to define the array shape of a Variable. It may be shared among Variables, which provides a simple yet powerful way of associating Variables. When a Dimension is shared, it has a unique name within the Group. If unlimited, a Dimension's length may increase. If variableLength, then the actual length is data dependent, and can only be found by reading the data.

An Attribute has a name and a value, and associates arbitrary metadata with a Variable or a Group. The value can be a one dimensional array of Strings or numeric values.

A Structure is a type of Variable that contains other Variables, analogous to a struct in C, or a row in a relational database. In general, the data in a Structure are physically stored close together on disk, so that it is efficient to retrieve all of the data in a Structure at the same time. A Variable contained in a Structure is a member Variable, and can only be read in the context of its containing Structure.

A Sequence is a one dimensional Structure whose length is not known until you actually read the data. To access the data in a Sequence, you can only iterate through the Sequence, getting the data in one Structure instance at a time.

An EnumTypedef is an enumeration of Strings, used by Variables of type enum.

An Object name refers to the name of a Group, Dimension, Variable, Attribute, or EnumTypedef. An object name is a String, a variable length array of Unicode characters. The set of allowed characters is still being considered.

An Array contains the actual data for a Variable after it is read from the disk or network. You get an Array from a Variable by calling read() or its variants. An Array is always rectangular in shape (like Fortran arrays). There is a specialized Array type for each of the DataTypes.

DataType describes the possible types of data:

The primitive numeric types are byte, short, int, long, float and double. The integer types (8-bit byte, 16-bit short, 32-bit int, 64-bit long) may be signed or unsigned. Float and double are IEEE encoded.

A String is a variable length array of Unicode characters. When reading/writing a String to a file or other external representation, the characters are by default UTF-8 encoded (note that ASCII is a subset of UTF-8). Libraries may use different internal representations, for example the Java library uses UTF-16 encoding.

The char type contains uninterpreted characters, one character per byte. Typically these contain 7-bit ASCII characters.

An enum type is an enumeration of (distinct integer value, String) pairs.. A Variable with enum type stores integer values, which are then mapped to Strings.

A StructureData holds the data for Structures and Sequences, and allows access to each of the member Variable's data.


Coordinate Systems


Scientific Feature Types


This document is maintained by John Caron and was last updated on Apr 27, 2008

 

 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Office of Programs University Corporation for Atmospheric Research (UCAR)   Unidata is a member of the UCAR Office of Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690