Unidata's Common Data Model
Mapping to
the ISO 19123 Data Model
John Caron, Stefano Nativi, and Ben Domenico
Draft last modified:
November 30, 2006
<< This is Ben's attempt to combine presentations by John and Stefano into an HTML document >>
The primary goal of the GALEON (Geo-Interface to Air, Land, Environment, Oceans NetCDF) interoperability experiment is to provide a standards-based interface to the wealth of Earth science datasets that are currently available in netCDF and HDF form -- often served via the OPeNDAP client-server protocol. This document describes the underlying data models used in those technologies. In particular it focuses on the Unidata Common Data Model (CDM) which combines the most valuable features of netCDF (augemented with CF conventions), HDF, and OPeNDAP and maps that into the corresponding elements of the international standard ISO 19123 -- specifically for the Discrete Grid Point Coverage data model.
This document describes the basic concepts behind the Unidata Common Data Model (CDM) which fuses the best characteristics of the existing netCDF, HDF, and OPeNDAP data models. The main goal is to arrive at a package that is more powerful than each of the others individually, but maintains the fundamental simplicity and ease of use of the original netCDF. The CDM is discussed in greater detail in Unidata's Common Data Model and THREDDS Data Server at : http://www.unidata.ucar.edu/projects/THREDDS/CDM/CDM-TDS.htm. Much of the early material in this paper is taken from that earlier document. The subsequent sections discuss the mapping from CF-netCDF to the ISO 19123 data model.
In a philosophical sense, a data model is a way of thinking about scientific data. It’s an abstraction. Some of these data model abstractions have been incorporated into systems for storing and accessing scientific data. Where the data models differ significantly, it can be challenging to make the data systems interoperate with one another, which in turn, can stifle interdisciplinary research by hindering integrated analysis and viewing of multiple datasets from different domains.
In computer temrs, a data model can be thought of as equivalent to an abstract object model in Object Oriented Programming in that an Abstract Data Model describes data objects and what methods you can use on them.
An abstract data model can be instatiated in several forms, for example:
The Abstract Data Model, on the other hand, removes the details of any particular API and the persistence format in which the datasets are actually stored..
The netCDF-3 data model shown in the Universal Modeling Language (UML) diagram below is fairly simple. A dataset has dimensions, variables, and attributes. Attributes can be global or apply to individual variables. There is a very limited set of low level data types.

netCDF-3 Data Model UML Diagram
The OPeNDAP data model has many things in common with netCDF. But t has a richer set of low level data types and includes structures, sequences and grids.

OPeNDAP (DAP-2) Data Model UML Diagram
HDF-5 has a much richer set of low level data types and includes the key feature of a group of variables. As with OPeNDAP, HDF-5 includes structures.

HDF-5 Data Model UML Diagram
At the data access level, the CDM maintains as much as possible of the elegance of the netCDF-3 inteface, but add important features from OPeNDAP and HDF, most notably:

Common Data Model (data access layer) UML Diagram
As noted at the outset, the CDM is an effort to fuse the best characteristics of the existing data models which is more powerful than each of the others, but maintains the fundamental simplicity and ease of use of the original netCDF. The resulting CDM consists of several layers The top layer provides interfaces to a set of scientific data types. The middle layer provides access to coordinate system information, and, at the bottom lies the actual data access layer.

Common Data Model Layers
The netCDF, OPeNDAP, HDF data models do not have integrated coordinate systems, so georeferencing is not a part of the API. As a consequence, the coordinate system information is inferred. In the best case, the files confrom to a set of established conventions (eg CF-1, COARDS, etc). << Need help from John here.>> In contrast, GRIB, HDF-EOS, other specialized formats. However, in the CDM, the coordinate system information must be handled in a general way. The approach is shown in the following diagram.

CDM Coordinate System UML Diagram
In order to introduce more specific semantic elements (i.e. metadata) which are required by different communities to fully describe their datasets, the netCDF data model was extended adding a set of conventions. One of the most popular convention is the Climate and Forecasting metadata convention (CF). The following figure depicts the CF-netCDF data
CF conventions are quite loose, to maximize backward compatibility with the earlier COARDS conventions. Besides, support for precise geo-location is scarce. For example, CF conventions assume that “Latitude, longitude, and time are defined by internationally recognized standards, and hence, identifying the coordinates of these types is sufficient to locate data values uniquely with respect to time and a point on the earth's surface.”
On the other hand, the CF model is very flexible and, consequently, complex. The following diagram depicts CF conventions and their relationship with netCDF concepts, in UML.


Two renditions of the CF-netCDF data model:
Left is from Stefano's slides, right is from CF-netCDF application profile document
Click on a thumbnail to get a readable version.
For a discussion of the scientific data types layer of the CDM, refer to Unidata's Common Data Model and THREDDS Data Server at : http://www.unidata.ucar.edu/projects/THREDDS/CDM/CDM-TDS.htm
The technological components of the CDM have evolved as de facto standards over the last couple decades in the communities they serve. In particular, the atmospheric science and oceanography communities (sometimes referred to as the Fluid Earth Sciences or FES) have taken advantage of netCDF, HDF, and OPeNDAP. During the same period, other disciplines (notably solid Earth, hydrology, and human impacts) have employed Geographic Information Systems (GIS) technologies where the data models are quite different from those of the CDM. One approach to acheiving interoperability between the data systems in these communities is to employ evolving international standards, especially those promulgated by the OGC (Open Geospatial Consortium) and the international standards organization, ISO.
ISO has developed a very elaborate and complete set of abstract data models. In particular the ISO technical committee on Geographic information/Geomatics (TC 211) has defined the ISO 19123 data model. Mapping between the CDM data model and the ISO 19123 are a key foundation component for establishing interoperability between the data systems in the realms of CDM and GIS technologies.
Many netCDF files in the atmospheric and oceanic sciences contain gridded data. In the realm of ISO data models, the "coverage" is used to represent gridded data. The ISO definition of a coverage is:
A coverage is a feature that associates positions within a bounded space (its domain) to feature attribute values (its range). In other words, it is both a feature and a function. Examples include a raster image, a polygon overlay or a digital elevation matrix..[ISO 19123].
The following Figure shows the coverage types introduced by ISO 19123.

ISO 19123 Coverage Subclasses
Click on the thumbnail to get a readable version
As far as the general geo-information framework is concerned, a coverage is a special type of "feature."
ContinuousCoverage type is the subclass of Coverage that returns a distinct record of feature attribute values for any direct position within its domain. The domain of a DiscreteCoverage consists of a collection of geometric objects or points in space. DiscreteCoverages are subclassed on the basis of the type of geometric object in the spatial domain.
( ISO abstract data models employ the language of mathematical function in the sense that the domain can be thought of as the set of values of independent variables defining positions in 3-dimensional space and time while the range is the set of values that the function takes on at those points in space.)
DiscretePointCoverage type is characterized by a finite domain consisting of points. Generally, the domain is a set of irregularly distributed points; the principal use of discrete point coverages is to provide a basis for continuous coverage functions. Indeed, DiscretePointCoverage occurrences could be used to implement multi-point coverage domains.
The domain of a DiscreteGridPointCoverage occurrency is a set of GridPoints that are associated with records of feature attribute values through a GridValuesMatrix element.
Certainly, DiscreteGridPointCoverage occurrences must be used to implement gridded-based coverage domains -either regularly or quasi-regularly spaced ones.
The following Figure depicts the DiscreteGridPointCoverage model.
DiscreteGridPointCoverage Model
<<Here again, for a readable version, you have to download the image
and view it in a program that lets you scroll around in it at full size.>>
The diagram references the ISO elements described in the following table
| domainExtent | The attribute domainExtent shall contain the extent of the domain of the coverage. The data type EX_Extent is defined in ISO 19108:2003. Extents may be specified in space, time or space-time. |
| rangeType | The attribute rangeType shall describe the range of the coverage. The data type RecordType is defined in ISO/TS 19103. It consists of a list of attribute name/data type pairs. A simple list is the most common form of rangeType, but RecordType can be used recursively to describe more complex structures. |
| commonPointRule | The attribute commonPointRule shall identify the procedure to be used for evaluating the Coverage at a position that falls either on a boundary between geometric objects or within the boundaries of two or more overlapping geometric objects. |
| Coordinate Reference System | The association Coordinate Reference System shall link the Coverage to the coordinate reference system to which the objects in its domain are referenced. The class SC_CRS is specified in ISO 19111:2003. The multiplicity of the CRS role in the Coordinate Reference System association is one, so a coverage with the same range but with its domain defined in a different coordinate reference system is a different coverage. |
| CoverageFunction | The association CoverageFunction shall link the discrete Coverage to the set of GeometryValuePairs included in the coverage. The association CoverageFunction is shown as derived because the elements may be generated from the GridValuesMatrix through the association PointFunction. |
| PointFunction | The association PointFunction shall link the DiscreteGridPointCoverage to the GridValuesMatrix for which it is an evaluator. The range of a Coverage shall be a homogeneous collection of records. That is, the range shall have a constant dimension over the entire domain, and each field of the record shall provide a value of the same attribute type over the entire domain. |
| GridPointValuePair | The class GridPointValuePair describes an element of a set that defines the relationships of a discrete grid point coverage. In fact, the domain of a DiscreteGridPointCoverage is a set of GridPoints that are associated with records of feature attribute values through a GridValuesMatrix. GridPointValuePair is composed of a GridPoint geometry and a feature attribute value Record. |
| point | The attribute point shall be the geometry member of the GridPointValuePair. |
| value | The attribute value shall be the member of the GridPointValuePair taken from the sequence values in the GridValuesMatrix. |
| GridValueMatrix | GridValuesMatrix is a subclass of Grid that ties feature attribute values to grid geometry. It has three attributes: values, sequencingRule and startSequence. It holds a sequence of records associated with a sequencing rule that specifies an algorithm for assigning records of feature attribute values to grid points. An instance of the GridValuesMatrix may be, at the same time, an instance of either a generic Grid or one of its subclasses: RectifiedGrid and ReferenceableGrid. |
| values | The attribute values shall be a sequence of N feature attribute value records where N is the number of grid points within the section of the grid specified by extent. |
| sequencingRule | The attribute sequencingRule shall describe how the grid points are ordered for association to the elements of the sequence values. |
| startSequence | The attribute startSequence shall identify the grid point to be associated with the first record in the values sequence. |
| SequenceRule | SequenceRule is a data type that contains information for mapping grid coordinates to a position within the sequence of records of feature attribute values. |
| type | The attribute type shall identify the type of sequencing method that shall be used. The default value shall be “linear”. |
| scanDirection | The attribute scanDirection shall be a list of signed axisNames that indicates the order in which grid points shall be mapped to position within the sequence of records of feature attribute values. An additional element may be included in the list to allow for interleaving of feature attribute values. . See Annex D of ISO 19123. |
| SequenceType | SequenceType is a code list that identifies methods for sequential enumeration of the grid points. See Annex D of ISO 19123. |
| Record and RecordType | A Record is a structure of logically related elements, and may be used as an implementation representation for features, by keeping a list of (name, value) pairs in a dictionary. This represents a generic storage structure for features. |
| Dictionary | A dictionary is similar to an array, except that the lookup index for an array is expressed in integer numbers. |
The two diagrams below illustrate the mapping from the CF-netCDF data model to that of ISO 19123 DiscreteGridPointCoverage data model.
CF-netCDF to ISO DiscreteGridPointCoverage Mappings
Click on the thumbnail to get a readable version
| CF-netCDF entity | ISO DiscreteGridPointCoverage |
|---|---|
CF-Dataset |
CV_DiscreteGridPointCoverage |
CF-Variable |
Record |
Coordinate |
Record |
Vertical Coordinate |
CV_Rectified Grid |
LatLon Coordinat |
CV_Rectified Grid |
Time Coordinate |
CV_Rectified Grid |
Unidata Glossary
http://www.unidata.ucar.edu/publications/acronyms/glossary.html
netCDF:
http://www.unidata.ucar.edu/software/netcdf/
NetCDF Java:
http://www.unidata.ucar.edu/software/netcdf-java/
Common Data Model:
http://www.unidata.ucar.edu/software/netcdf/CDM/index.html
Climate and Forecast (CF) Metadata:
http://www.cgd.ucar.edu/cms/eaton/cf-metadata/
CF standard name table:
http://www.cgd.ucar.edu/cms/eaton/cf-metadata/standard_name.html
BADC Datasets: CF conventions:
http://badc.nerc.ac.uk/help/formats/netcdf/index_cf.html
NetCDF Markup Language (ncML):
http://www.unidata.ucar.edu/software/netcdf/ncml/
Design and implementation of netCDF markup language (NcML) and its GML-based extension (NcML-GML), Computers & Geosciences, Volume 31, Issue 9, November 2005, Pages 1104-1118.
http://www.sciencedirect.com
http://www.unidata.ucar.edu/projects/THREDDS/
http://hdf.ncsa.uiuc.edu/
http://www.unidata.ucar.edu/
http://galeon-wcs.jot.com/WikiHome
Climate Science Modeling Language (CSML):
http://ndg.nerc.ac.uk/csml/
http://www.nerc.ac.uk/
https://cdp.ucar.edu/
http://gcmd.gsfc.nasa.gov/