A standards-based, web services gateway to netCDF datasets

Ben Domenico*, Stefano Nativi**, John Caron*, Lorenzo Bigagli**, Ethan Davis*

*Unidata Program Center
University Corporation for Atmospheric Research
Boulder, CO 80307 - USA
{caron, edavis, ben}@ucar.edu

**University of Florence at Prato
Piazza Ciardi 25
59100 Prato
and
Italian National Research Council (CNR)
{stefano.nativi, lorenzo.bigagli}@pin.unifi.it

Abstract

Teams at the Unidata Program Center and University of Florence are working with a number of international partners to implement a web services interface to traditional atmospheric and oceanographic datasets currently stored in netCDF form or served via the OPeNDAP protocol . The project will result in a gateway service using Web Coverage Service (WCS) specification of the Open Geospatial Consortium (OGC). Underneath the WCS interface will be a combination of technologies including THREDDS (THematic Real-time Environmental Distributed Data Services) and HDF5 (Hierarchical Data Format) in addition to netCDF and OPeNDAP. A key component of the project is to develop mechanisms for explicit encoding of coordinate system information in the form of Coordinate System extensions to NcML (the netCDF Markup Language), directly in the data files themselves and in the form of GML (Geography Markup Language) extensions to NcML. These extensions, called NcML-GML, include a subset profile of the standard GML which is in the late stages of adoption by the International Standards Organization (ISO). The WCS interface specification will be developed in the context of an OGC Interoperability Experiment called GALEON (Geo-interface to Atmosphere, Land, Earth, Ocean NetCDF). The paper presents the current status and updated objectives of the project.

Note that a n HTML version of this publication is available at:
http://www.unidata.ucar.edu/projects/THREDDS/OGC/GatewayToNetCDF.htm

THREDDS-Enabled Data Services Supported at Present Time

In the Unidata community framework of client/server data and metadata access systems, data are served via a number of protocols at different data provider sites. On the client side, certain applications can access data via some of the protocols while others can only access data via other protocols. THREDDS catalogs provide information about which datasets are available via which services and protocols. The primary client/server (as opposed to full-file transfer with FTP or GridFTP) protocols for remote data access in the community are OPeNDAP, ADDE (Abstract Data Distribution Environment), and netCDF access via HTTP. In many cases the data access systems are augmented and integrated with THREDDS catalog services which proved inventory list and metadata access. Thus client applications can determine which datasets are available on the site via the THREDDS interface, then access the datasets themselves via OPeNDAP, ADDE, or netCDF/HTTP protocols.

Presently Supported Client/Server Facilities in the Unidata/THREDDS Community

The underlying data file storage formats are:

OGC Data Services

Open Geospatial Consortium Web Mapping Servers (WMS) provide access to datasets that have been converted into visual map form. However Web Feature Servers (WFS) and Web Coverage Servers (WCS) enable access to datasets themselves. An oversimplified characterization of theses services is that WFS is used for traditional (Geographic Information System) GIS vector data (point, line, polygon) whereas WCS is for coverage (image, grid) data.

OGC Data Service Protocols

The formats currently supported by WCS are:

OGC Augmented with NetCDF Dataset Interface

The goal is to add OWS (OGC Web Services) data access to THREDDS as a way to bridge between the GIS community and Unidata's geoscience community . We propose to add the NetCDF file format to the suite of WCS supported data formats.

OGC Data Services with NetCDF Gateway

WCS Gateway to THREDDS-Integrated Services

The diagram below shows the key components of the WCS protocol interface layer we propose to implement on top of components that already are in operation at scores of THREDDS-integrated data provider sites.

WCS NetCDF Components

As the figure shows, the gateway converses with the client via WCS protocols (getCapabilities, describeCoverage, and getCoverage). These requests are translated into the equivalent requests in using the interfaces and protocols of the THREDDS-integrated server:

For a detailed description of NcML-GML, please see NetCDF Markup Language (NcML) and its GML-based extension (NcML-GML) (http://www.unidata.ucar.edu/publications/ComputersAndGeosciences2004/nativicompsandgeo04.pdf) . A less extensive but more recently updated description of ncML-GML (http://www.unidata.ucar.edu/projects/THREDDS/GALEON/ncML-GML%200.5_submitted.pdf) describes the ncML-GML 0.5 release. As of this writing, a description of the current release (0.7.2) is being created.

Why Do It?

One might ask what is to be gained by having netCDF as one form of coverage supported by WCS. There is much to be gained by both the Earth science community that use netCDF as well as by the GIS community. Among the main reasons for bringing the two together are:

Thus the inclusion of netCDF as a WCS format adds only one new data access interface that in turn brings in collections of forecast model data via a variety of protocols that are already in use in the data provider community. A draft conceptual overview of this approach is provided in THREDDS-Integrated Dataset Discovery and Access Overview

The Approach

To incorporate netCDF as an alternative for WCS data access, extent ions are needed for both the netCDF interface and the WCS specification. The key tasks would be:

  1. To extend the netCDF software implementing a mediation component which maps netCDF (plus conventions) data content model onto WCS data content model.
  2. To extend the WCS specification in order to support the returning of datasets characterized by a content modeled according to netCDF data content model; such datasets must be encoded in one of three possible format: netCDF file, ncML document, ncML-GML document.

The following figure depicts the main concepts as well as the needed extensions.

WCS extension to accommodate netCDF datasets

Data Model Mediation


The next figure shows the model mediation scenario. WCS data content model is mainly based on GML specification; GML 3.x encodes the ISO 19100 specification.


Model Mediation

Starting from netCDF-CF data model, THREDDS Dataset Inventory Catalog and the ncML-CS content models capture and formalize some important semantics in order to mediate between netCDF and ISO 19100 content models. Finally, the ncML-GML content model introduces the remaining required semantics in order to fully mediate from netCDF-CF and ISO 1900 content models. NcML-GML sorts out issues related to the structural mismatch and the lack of explicit metadata content, using the GML 3.1 encoding grammar.

If such mediations, duly formalized, can be included in the WCS extension document for netCDF, they will be the basis for initiating the integration of netCDF into WCS.

It is noteworthy that the same approach can be followed with WFS (Web Feature Service), and more generally with any OWS (OGC Web Service) which is GML-based.

GALEON Interoperability Experiment

The implementations of this gateway are being tested in the context of an OGC Interoperability Experiment (IE) known as GALEON (Geo-interface for Air, Land, Earth, Oceans NetCDF). This WCS IE is implementing a geo-interface to netCDF datasets via the WCS 1.0 protocol specification. As noted above, the WCS us being implemented as a layer above the set of client/server and catalog protocols already widely in use in the atmospheric and oceanographic sciences communities. In particular, it leverages the widespread base of OPeNDAP servers that provide access to netCDF datasets and accompanying THREDDS servers providing ancillary information about the datasets. The IE is investigating the feasibility of adapting data and metadata originating from OPeNDAP/THREDDS servers to the WCS specifications, in so contributing to bridge the gap between the atmospheric, oceanographic and GIS communities, by alleviating data interoperability issues.

The initial experiment stages are delivering collections of numerical forecast model output that consist of what are sometime referred to as five dimensional or 5D grids (multiple parameters (e.g., temperature, pressure, relative humidity) varying in three spatial dimensions with two time coordinates (model run time and forecast time).  It is important to note that, while it is convenient to refer to these as 5D datasets, the 3 spatial dimensions and temporal dimensions are fundamentally different in that they are part of the domain whereas the multiple parameters are part of the range in the WCS data models and interface specifications.

Objectives

The primary objectives of this IE will be to determine whether:

  1. a  viable WCS getCapabilities geo-interface (gateway in earlier versions) can be built on existing THREDDS inventory catalog services
  2. the ncML-G data model is adequate for providing describeCoverage responses for netCDF datasets
  3. there are any solutions to the previously identified limitations to geoTIFF encoding format for representing  from  5D netCDF files in such a way that the relationships among layers is preserved
  4. the proposed ncML-GML encoding format is a practical solution to serving 5D data from netCDF files, either embedded (ASCII or attached binary) or linked (OPeNDAP link or other URL)
  5. netCDF itself is a viable WCS binary encoding format
  6. existing WCS clients are able to access analyze and display 5D data from netCDF files
  7. 5D geospatial data sets can be served efficiently through standard database technology

Each objective will have a use case associated with it.

If the experiment determines the WCS specification is not adequate to support this geo-interface functionality, recommendations will be made to improve and extend the specification.

The use cases to be undertaken in the experiment are:

The following schematic diagram shows the primary components of each use case:

Use Case Components

A detailed description of the ncML-G data model and the ncML-GML encoding are given in http://www.unidata.ucar.edu/publications/ComputersAndGeosciences2004/nativicompsandgeo04.pdf. A less extensive but more recently updated description of ncML-GML (http://www.unidata.ucar.edu/projects/THREDDS/GALEON/ncML-GML%200.5_submitted.pdf) describes the ncML-GML 0.5 release. As of this writing, a description of the current release (0.7.2) is being created.

Participant ORGANIZATIONS

The following organizations will be participating in the GALEON IE.  Note that not all participants need to implement full clients or servers.   Some will exercise the newer aspects of the protocol; others will implement clients to determine whether the new types of dataset can be accessed, analyzed and displayed appropriately; others will provide input on improvement of the proposed NcML-GML and netCDF encoding formats.

Since the initial discussions of GALEON at the New York meeting of the OGC Technical Committee, what has evolved is a combination of the formal OGC GALEON Interoperability Experiment along with several specific implementation projects. The formal OGC IE has to be conducted by OGC members, but the implementations can involve additional organizations. Thus, for example, the WCS gateway to underlying OPeNDAP/THREDDS/netCDF technology may involve organizations that are not part of the OGC IE but will play an active role in testing the implementation in real world settings. Likewise the component of the experiment relating to WCS implementations via database technology may also involve groups that are not part of the formal OGC experiment.

OGC Member Participants

The following organizations have expressed an interest in participating in the IE at some level. Some may submit letters of support and become full participants and others observers. A few of them have already been involved in the practical matters of creating client and server implementations.

Other Participants

Current GALEON Implementation Goals and Status

The up to date status of the GALEON IE is being published on at:

The GALEON Wiki: ( http://galeon-wcs.jot.com/WikiHome).

As of this writing in late October, 2005, the status page provides updates on server implementations by Unidata, the University of Florence/IMAA-CNR, the International University of Bremen, RSI UK, the Jet Propulsion Laboratory, George Mason University. It also includes descriptions of experiences with client implementations from the University of Florence/IMAA-CNR, RSI UK, Cadcorp, George Mason University, and the UK NERC (Natural Environment Research Council).

Preliminary Conclusions

The GALEON IE is in the midst of the intense experimentation phase. As expected, most of the participating WCS clients are able to access data in geoTIFF form. As noted in the wiki status summary , there are several successes in cases where the data transfers are accomplished via netCDF, but some difficulties are arising as well. However, on balance, there is sufficient success to indicate that it will be likely that the IE will result in a recommendation that netCDF be added to the list of WCS binary encoding formats. The GML representations are just now being implemented so the experiments with those forms remain to be performed. One area where modest extensions to the specifications may be needed is associated with the need to work with irregular grids.

Appendix

This appendix takes a very simple example of a netCDF file, created solely for illustration purposes and shows how the netCDF metadata (and sometimes the data as well) appears in a variety of different forms ranging through:

The last three representations (ncML-CS, ncML-GML, and CSML) can be generated for netCDF files that conform to CF (Climate and Forecast) conventions.

CDL Representation of Sample netCDF file Conforming to CF Conventions

The following two diagrams show the CDL representation of the example file augmented with notations pointing out the portions of the file that implement the CF conventions in this simplified case.

Simple netCDF with CF Additions
(independent variables, names, units)

Simple netCDF with CF Additions
(range variables and global attributes)

To get a sense of the differences among the different representations of netCDF metadata, the sections below have excerpts from the different forms showing in one place how each form represents time-coordinate, a CF conforming latitude coordinate and one of the range (dependent) variables. Because the GML representation is so explicit regarding domain (independent) coordinate variable specifications, it tends to be quite verbose in some instances. -- compared to the forms in which some of the information remains implicit.

Time Coordinate Specification

netCDF CDL form:

variables: // variable types, names, shapes, attributes

short time(time);

time:standard_name = "time";
time:units = "hours since 1996-1-1";

ncML form:

<variable name="time" shape="time" type="short">

<attribute name="standard_name" type="String" value="time" />
<attribute name="units" type="String" value="hours since 1996-1-1" />

</variable>

ncML with explicit coordinate system extensions:

<coordinateAxis name="time" shape="time" type="short" units="hours since 1996-1-1" axisType="Time">

<attribute name="standard_name" type="String" value="time" />
<attribute name="units" type="String" value="hours since 1996-1-1" />
<attribute name="_CoordinateAxisType" type="String" value="Time" />

</coordinateAxis>

ncML-GML form (temporal Coordinate Reference System (CRS):

<gml:TemporalCRS gml:id="customTemporalCRS">

<gml:srsName>hours since 1996-1-1</gml:srsName>

<gml:usesTemporalCS>

<gml:TemporalCS gml:id="NetCDF-CF_standard">
<gml:csName codeSpace="NetCDF-CF_v1.0">standard/gregorian calendar</gml:csName>
<gml:remarks>Mixed Gregorian/Julian calendar as defined by Udunits: Gregorian since 15 oct. 1582, Julian before</gml:remarks>

<gml:usesAxis>

<gml:CoordinateSystemAxis gml:id="time" gml:uom="http://www.unidata.ucar.edu/software/udunits/udunits.txt#hours">

<gml:name>Time</gml:name>
<gml:axisID>

<gml:name codeSpace="NcML-CS">time</gml:name>

</gml:axisID>
<gml:axisAbbrev>Time</gml:axisAbbrev>
<gml:axisDirection>up</gml:axisDirection>

</gml:CoordinateSystemAxis>

</gml:usesAxis>

</gml:TemporalCS>

</gml:usesTemporalCS>

<gml:usesTemporalDatum>

<gml:TemporalDatum gml:id="customTemporalOrigin">

<gml:datumName>January 1, 1996</gml:datumName>
<gml:origin>1996-01-01T00:00:00Z</gml:origin>

</gml:TemporalDatum>

</gml:usesTemporalDatum>

</gml:TemporalCRS>

Latitude Coordinate

netCDF CDL form:

int lat(lat), lon(lon), level(level);

lat:units = "degrees_north";
lat:standard_name = "latitude";

ncML form:

<variable name="lat" shape="lat" type="int">

<attribute name="units" type="String" value="degrees_north" />
<attribute name="standard_name" type="String" value="latitude" />

</variable>

ncML with explicit coordinate system extensions:

<coordinateAxis name="lat" shape="lat" type="int" units="degrees_north" axisType="Lat">

<attribute name="units" type="String" value="degrees_north" />
<attribute name="standard_name" type="String" value="latitude" />
<attribute name="_CoordinateAxisType" type="String" value="Lat" />

</coordinateAxis>

ncML-GML form:

<ncco:spatialCRS>

<gml:GeographicCRS gml:id="EPSG4326">

<gml:srsName codeSpace="EPSG">WGS 84</gml:srsName>

<gml:srsID>

<gml:name codeSpace="EPSG">4326</gml:name>

<gml:version>6.7</gml:version>

</gml:srsID>

<gml:remarks>CRS kind: geographic 2D</gml:remarks>

<gml:validArea>

<gml:description>World</gml:description>

</gml:validArea>

<gml:scope>GPS satellite navigation and NATO military surveying</gml:scope>

<gml:usesEllipsoidalCS>

<gml:EllipsoidalCS gml:id="EPSG6422">

<gml:csName>Ellipsoidal 2D CS</gml:csName>

<gml:remarks>Axis order is by element order</gml:remarks>

<gml:usesAxis>

<gml:CoordinateSystemAxis gml:id="EPSG9901" gml:uom="urn:x-epsg:v0.1:uom:degree">

<gml:name>Geodetic latitude</gml:name>
<gml:axisID>

<gml:name codeSpace="NcML-CS">lat</gml:name>

</gml:axisID>
<gml:axisAbbrev>Lat</gml:axisAbbrev>
<gml:axisDirection>north</gml:axisDirection>

</gml:CoordinateSystemAxis>

</gml:usesAxis>

Dependent (Range) Variables

netCDF CDL form:

float temp(time, level, lat, lon);

:long_name = "temperature";
:standard_name = "air_temperature";
:units = "celsius";

ncML form:

<variable name="temp" shape="time level lat lon" type="float">

<attribute name="long_name" type="String" value="temperature" />
<attribute name="standard_name" type="String" value="air_temperature" />
<attribute name="units" type="String" value="celsius" />

</variable>

ncML with explicit coordinate system extensions:

<variable name="temp" shape="time level lat lon" type="float" coordinateSystems="time-level-lat-lon">

<attribute name="long_name" type="String" value="temperature" />
<attribute name="standard_name" type="String" value="air_temperature" />
<attribute name="units" type="String" value="celsius" />

</variable>

ncML-GML form (given for relative humidity rather than temperature, includes data):

<variable coordinateSystem="time-lat-lon" name="rh" shape="time lat lon" type="float">

<attribute name="long_name" type="string" value="relative humidity"/>
<attribute name="valid_range" type="double" value="0.0 1.0"/>

</variable>

<ncco:scalarRangeSet>

<ncco:netcdfVariableRef referenceName="rh"/>
<ncco:asciiData>

<gml:QuantityList uom="http://www.unidata.ucar.edu/software/udunits/udunits.txt#deg_C">0.5 0.2 0.4 0.2 0.3 0.2 0.4 0.5 0.6 0.7 0.1 0.3 0.1 0.1 0.1 0.1 0.5 0.7 0.8 0.8 0.1 0.2 0.2 0.2 0.2 0.5 0.7 0.8 0.9 0.9 0.1 0.2 0.3 0.3 0.3 0.3 0.7 0.8 0.9 0.9 0.0 0.1 0.2 0.4 0.4 0.4 0.4 0.7 0.9 0.9 </gml:QuantityList>

</ncco:asciiData>

</ncco:scalarRangeSet>

References

An HTML version of this publication:
http://www.unidata.ucar.edu/projects/THREDDS/OGC/GatewayToNetCDF.htm

netCDF:
http://www.unidata.ucar.edu/software/netcdf/

The NetCDF Users' Guide:
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html

Climate and Forecast (CF) Metadata:
http://www.cgd.ucar.edu/cms/eaton/cf-metadata/

CF standard name table:
http://www.cgd.ucar.edu/cms/eaton/cf-metadata/standard_name.html

Standard Units:
http://www.unidata.ucar.edu/software/udunits/

BADC Datasets: CF conventions:
http://badc.nerc.ac.uk/help/formats/netcdf/index_cf.html

NetCDF Markup Language (ncML):
http://www.unidata.ucar.edu/software/netcdf/ncml/

NcML Coordinate System Extension (NcML-CS):
http://www.unidata.ucar.edu/software/netcdf-java/CoordinateAttributes3.html

NcML Geography Markup Language (NcML - GML):
http://www.gmldays.com/gml2005/presentations/ncML-GML%20v.0.3.2,%20Ben%20Domenico.pdf

Climate Science Modeling Language (CSML):
http://ndg.nerc.ac.uk/csml/

Example of netCDF CDL, ncML, ncML-CS, ncML-GML, CSML:
http://www.unidata.ucar.edu/projects/THREDDS/GALEON/NetCDFandStandards.htm

CF-Convention compliance checker for NetCDF format:
http://titania.badc.rl.ac.uk/cgi-bin/cf-checker.pl

N2G converter v 0.4:
http://athena.pin.unifi.it:8080/ncml-gml/n2g-form.htm

NetCDF Tools UI WebStart:
http://www.unidata.ucar.edu/software/netcdf-java/v2.2/webstart/index.html

NetCDF Binaries:
http://www.unidata.ucar.edu/software/netcdf/binaries.html

NcML: Table:
http://www.unidata.ucar.edu/projects/THREDDS/OGC/NcMLtable.htm

Example netCDF files
unidata.ucar.edu/software/netcdf/examples/files.html

Guidelines for construction of CF standard names

Open Geospatial Consortium:
http://www.opengeospatial.org/

Geography Markup Language (GML):
http://xml.coverpages.org/ni2004-03-26-a.html

Extensible Markup Language (XML)

International Standards Organization (ISO):
http://www.iso.org/iso/en/ISOOnline.frontpage

ISO TC 211 Geographic Information, Geomatics:
http://www.iso.ch/iso/en/CatalogueListPage.CatalogueList?COMMID=4637&scopelist=PROGRAMME

ISO CD 19136: Geography Markup Language (GML):
http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=32554&scopelist=PROGRAMME

European Petroleum Survey Group (EPSG)
http://www.epsg.org/main.html

International Association of Oil and Gas Producers (OGP:
http://www.ogp.org.uk/

World Geodetic System (WGS)
http://www.wgs84.com/