Unidata's Common Data Model
Mapping to
the ISO 19123 Data Model

John Caron, Stefano Nativi, and Ben Domenico
Draft last modified: November 30, 2006

<< This is Ben's attempt to combine presentations by John and Stefano into an HTML document >>

Overview

The primary goal of the GALEON (Geo-Interface to Air, Land, Environment, Oceans NetCDF) interoperability experiment is to provide a standards-based interface to the wealth of Earth science datasets that are currently available in netCDF and HDF form -- often served via the OPeNDAP client-server protocol. This document describes the underlying data models used in those technologies. In particular it focuses on the Unidata Common Data Model (CDM) which combines the most valuable features of netCDF (augemented with CF conventions), HDF, and OPeNDAP and maps that into the corresponding elements of the international standard ISO 19123 -- specifically for the Discrete Grid Point Coverage data model.

This document describes the basic concepts behind the Unidata Common Data Model (CDM) which fuses the best characteristics of the existing netCDF, HDF, and OPeNDAP data models. The main goal is to arrive at a package that is more powerful than each of the others individually, but maintains the fundamental simplicity and ease of use of the original netCDF. The CDM is discussed in greater detail in Unidata's Common Data Model and THREDDS Data Server at : http://www.unidata.ucar.edu/projects/THREDDS/CDM/CDM-TDS.htm. Much of the early material in this paper is taken from that earlier document. The subsequent sections discuss the mapping from CF-netCDF to the ISO 19123 data model.

What's a Data Model?

In a philosophical sense, a data model is a way of thinking about scientific data. It’s an abstraction. Some of these data model abstractions have been incorporated into systems for storing and accessing scientific data. Where the data models differ significantly, it can be challenging to make the data systems interoperate with one another, which in turn, can stifle interdisciplinary research by hindering integrated analysis and viewing of multiple datasets from different domains.

In computer temrs, a data model can be thought of as equivalent to an abstract object model in Object Oriented Programming in that an Abstract Data Model describes data objects and what methods you can use on them.

What Forms Do Data Models Take?

An abstract data model can be instatiated in several forms, for example:

The Abstract Data Model, on the other hand, removes the details of any particular API and the persistence format in which the datasets are actually stored..

Existing Data Models

NetCDF-3

The netCDF-3 data model shown in the Universal Modeling Language (UML) diagram below is fairly simple. A dataset has dimensions, variables, and attributes. Attributes can be global or apply to individual variables. There is a very limited set of low level data types.

netCDF-3 Data Model UML Diagram

OPeNDAP

The OPeNDAP data model has many things in common with netCDF. But t has a richer set of low level data types and includes structures, sequences and grids.

OPeNDAP (DAP-2) Data Model UML Diagram

HDF-5

HDF-5 has a much richer set of low level data types and includes the key feature of a group of variables. As with OPeNDAP, HDF-5 includes structures.

HDF-5 Data Model UML Diagram

Common Data (Access) Model

At the data access level, the CDM maintains as much as possible of the elegance of the netCDF-3 inteface, but add important features from OPeNDAP and HDF, most notably:

Common Data Model (data access layer) UML Diagram

Creating a Common Data Model for
netCDF, OPeNDAP, HDF

As noted at the outset, the CDM is an effort to fuse the best characteristics of the existing data models which is more powerful than each of the others, but maintains the fundamental simplicity and ease of use of the original netCDF. The resulting CDM consists of several layers The top layer provides interfaces to a set of scientific data types. The middle layer provides access to coordinate system information, and, at the bottom lies the actual data access layer.

Common Data Model Layers

Coordinate Systems Layer

The netCDF, OPeNDAP, HDF data models do not have integrated coordinate systems, so georeferencing is not a part of the API. As a consequence, the coordinate system information is inferred. In the best case, the files confrom to a set of established conventions (eg CF-1, COARDS, etc). << Need help from John here.>> In contrast, GRIB, HDF-EOS, other specialized formats. However, in the CDM, the coordinate system information must be handled in a general way. The approach is shown in the following diagram.

CDM Coordinate System UML Diagram

Semantic Metadata via CF-Conventions

In order to introduce more specific semantic elements (i.e. metadata) which are required by different communities to fully describe their datasets, the netCDF data model was extended adding a set of conventions. One of the most popular convention is the Climate and Forecasting metadata convention (CF). The following figure depicts the CF-netCDF data

CF conventions are quite loose, to maximize backward compatibility with the earlier COARDS conventions. Besides, support for precise geo-location is scarce. For example, CF conventions assume that “Latitude, longitude, and time are defined by internationally recognized standards, and hence, identifying the coordinates of these types is sufficient to locate data values uniquely with respect to time and a point on the earth's surface.”

On the other hand, the CF model is very flexible and, consequently, complex. The following diagram depicts CF conventions and their relationship with netCDF concepts, in UML.

CF-netCDF from Stefano's slidesCF-netCDF from Stefano's OGC document
Two renditions of the CF-netCDF data model:
Left is from Stefano's slides, right is from CF-netCDF application profile document

Click on a thumbnail to get a readable version.

Scientific Data Types

For a discussion of the scientific data types layer of the CDM, refer to Unidata's Common Data Model and THREDDS Data Server at : http://www.unidata.ucar.edu/projects/THREDDS/CDM/CDM-TDS.htm

Interoperability via International Standards

The technological components of the CDM have evolved as de facto standards over the last couple decades in the communities they serve. In particular, the atmospheric science and oceanography communities (sometimes referred to as the Fluid Earth Sciences or FES) have taken advantage of netCDF, HDF, and OPeNDAP. During the same period, other disciplines (notably solid Earth, hydrology, and human impacts) have employed Geographic Information Systems (GIS) technologies where the data models are quite different from those of the CDM. One approach to acheiving interoperability between the data systems in these communities is to employ evolving international standards, especially those promulgated by the OGC (Open Geospatial Consortium) and the international standards organization, ISO.

ISO has developed a very elaborate and complete set of abstract data models. In particular the ISO technical committee on Geographic information/Geomatics (TC 211) has defined the ISO 19123 data model. Mapping between the CDM data model and the ISO 19123 are a key foundation component for establishing interoperability between the data systems in the realms of CDM and GIS technologies.

ISO Data Models

Many netCDF files in the atmospheric and oceanic sciences contain gridded data. In the realm of ISO data models, the "coverage" is used to represent gridded data. The ISO definition of a coverage is:

A coverage is a feature that associates positions within a bounded space (its domain) to feature attribute values (its range). In other words, it is both a feature and a function. Examples include a raster image, a polygon overlay or a digital elevation matrix..[ISO 19123].

The following Figure shows the coverage types introduced by ISO 19123.

ISO 19123 Coverage

ISO 19123 Coverage Subclasses
Click on the thumbnail to get a readable version

As far as the general geo-information framework is concerned, a coverage is a special type of "feature."

ContinuousCoverage type is the subclass of Coverage that returns a distinct record of feature attribute values for any direct position within its domain. The domain of a DiscreteCoverage consists of a collection of geometric objects or points in space. DiscreteCoverages are subclassed on the basis of the type of geometric object in the spatial domain.

( ISO abstract data models employ the language of mathematical function in the sense that the domain can be thought of as the set of values of independent variables defining positions in 3-dimensional space and time while the range is the set of values that the function takes on at those points in space.)

DiscretePointCoverage type is characterized by a finite domain consisting of points. Generally, the domain is a set of irregularly distributed points; the principal use of discrete point coverages is to provide a basis for continuous coverage functions. Indeed, DiscretePointCoverage occurrences could be used to implement multi-point coverage domains.

The domain of a DiscreteGridPointCoverage occurrency is a set of GridPoints that are associated with records of feature attribute values through a GridValuesMatrix element.

Certainly, DiscreteGridPointCoverage occurrences must be used to implement gridded-based coverage domains -either regularly or quasi-regularly spaced ones.

The following Figure depicts the DiscreteGridPointCoverage model.

DiscreteGridPointCoverage Model
<<Here again, for a readable version, you have to download the image
and view it in a program that lets you scroll around in it at full size.>>

The diagram references the ISO elements described in the following table

 
domainExtent The attribute domainExtent shall contain the extent of the domain of the coverage. The data type EX_Extent is defined in ISO 19108:2003. Extents may be specified in space, time or space-time.
rangeType

The attribute rangeType shall describe the range of the coverage. The data type RecordType is defined in ISO/TS 19103. It consists of a list of attribute name/data type pairs. A simple list is the most common form of rangeType, but RecordType can be used recursively to describe more complex structures.

commonPointRule

The attribute commonPointRule shall identify the procedure to be used for evaluating the Coverage at a position that falls either on a boundary between geometric objects or within the boundaries of two or more overlapping geometric objects.

Coordinate Reference System

The association Coordinate Reference System shall link the Coverage to the coordinate reference system to which the objects in its domain are referenced. The class SC_CRS is specified in ISO 19111:2003. The multiplicity of the CRS role in the Coordinate Reference System association is one, so a coverage with the same range but with its domain defined in a different coordinate reference system is a different coverage.

CoverageFunction

The association CoverageFunction shall link the discrete Coverage to the set of GeometryValuePairs included in the coverage. The association CoverageFunction is shown as derived because the elements may be generated from the GridValuesMatrix through the association PointFunction.

PointFunction

The association PointFunction shall link the DiscreteGridPointCoverage to the GridValuesMatrix for which it is an evaluator. The range of a Coverage shall be a homogeneous collection of records. That is, the range shall have a constant dimension over the entire domain, and each field of the record shall provide a value of the same attribute type over the entire domain.

GridPointValuePair

The class GridPointValuePair describes an element of a set that defines the relationships of a discrete grid point coverage. In fact, the domain of a DiscreteGridPointCoverage is a set of GridPoints that are associated with records of feature attribute values through a GridValuesMatrix. GridPointValuePair is composed of a GridPoint geometry and a feature attribute value Record.

point

The attribute point shall be the geometry member of the GridPointValuePair.

value

The attribute value shall be the member of the GridPointValuePair taken from the sequence values in the GridValuesMatrix.

GridValueMatrix

GridValuesMatrix is a subclass of Grid that ties feature attribute values to grid geometry. It has three attributes: values, sequencingRule and startSequence. It holds a sequence of records associated with a sequencing rule that specifies an algorithm for assigning records of feature attribute values to grid points. An instance of the GridValuesMatrix may be, at the same time, an instance of either a generic Grid or one of its subclasses: RectifiedGrid and ReferenceableGrid.

values

The attribute values shall be a sequence of N feature attribute value records where N is the number of grid points within the section of the grid specified by extent.

sequencingRule

The attribute sequencingRule shall describe how the grid points are ordered for association to the elements of the sequence values.

startSequence

The attribute startSequence shall identify the grid point to be associated with the first record in the values sequence.

SequenceRule

SequenceRule is a data type that contains information for mapping grid coordinates to a position within the sequence of records of feature attribute values.

type

The attribute type shall identify the type of sequencing method that shall be used. The default value shall be “linear”.

scanDirection

The attribute scanDirection shall be a list of signed axisNames that indicates the order in which grid points shall be mapped to position within the sequence of records of feature attribute values. An additional element may be included in the list to allow for interleaving of feature attribute values. . See Annex D of ISO 19123.

SequenceType

SequenceType is a code list that identifies methods for sequential enumeration of the grid points. See Annex D of ISO 19123.

Record and RecordType

A Record is a structure of logically related elements, and may be used as an implementation representation for features, by keeping a list of (name, value) pairs in a dictionary. This represents a generic storage structure for features.

Dictionary A dictionary is similar to an array, except that the lookup index for an array is expressed in integer numbers.

ISO 19123 Element Specifications

Mapping from CF-netCDF to ISO

The two diagrams below illustrate the mapping from the CF-netCDF data model to that of ISO 19123 DiscreteGridPointCoverage data model.


CF-netCDF to ISO DiscreteGridPointCoverage Mappings
Click on the thumbnail to get a readable version

CF-netCDF entity ISO DiscreteGridPointCoverage
CF-Dataset
CV_DiscreteGridPointCoverage
CF-Variable
Record
Coordinate
Record
Vertical Coordinate
CV_Rectified Grid
LatLon Coordinat
CV_Rectified Grid
Time Coordinate
CV_Rectified Grid

 

 

References

Unidata Glossary
http://www.unidata.ucar.edu/publications/acronyms/glossary.html

Document listing netCDF, CF Conventions, ncML, ncML-GML, OGC, ISO web pages
http://www.unidata.ucar.edu/projects/THREDDS/GALEON/netcdfAndCFwebpages.html

netCDF:
http://www.unidata.ucar.edu/software/netcdf/

The NetCDF Users' Guide:
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html

NetCDF Java:
http://www.unidata.ucar.edu/software/netcdf-java/

Common Data Model:
http://www.unidata.ucar.edu/software/netcdf/CDM/index.html

Climate and Forecast (CF) Metadata:
http://www.cgd.ucar.edu/cms/eaton/cf-metadata/

CF standard name table:
http://www.cgd.ucar.edu/cms/eaton/cf-metadata/standard_name.html

Standard Units:
http://www.unidata.ucar.edu/software/udunits/

BADC Datasets: CF conventions:
http://badc.nerc.ac.uk/help/formats/netcdf/index_cf.html

NetCDF Markup Language (ncML):
http://www.unidata.ucar.edu/software/netcdf/ncml/

NcML Coordinate System Extension (NcML-CS):
http://www.unidata.ucar.edu/software/netcdf-java/CoordinateAttributes3.html

Design and implementation of netCDF markup language (NcML) and its GML-based extension (NcML-GML), Computers & Geosciences, Volume 31, Issue 9, November 2005, Pages 1104-1118. http://www.sciencedirect.com/science/article/B6V7D-4GHSGN4-2/2/6bc151125c99352396f3aa7c630919e4)

NcML Geography Markup Language (NcML - GML):
http://www.gmldays.com/gml2005/presentations/ncML-GML%20v.0.3.2,%20Ben%20Domenico.pdf

http://www.unidata.ucar.edu/projects/THREDDS/

http://hdf.ncsa.uiuc.edu/

http://www.unidata.ucar.edu/

http://galeon-wcs.jot.com/WikiHome

Climate Science Modeling Language (CSML):
http://ndg.nerc.ac.uk/csml/

http://www.nerc.ac.uk/

https://cdp.ucar.edu/

http://gcmd.gsfc.nasa.gov/