Last Modified: September 9, 2005
<< Note that the information in here has been gleaned from email conversations between Stefano Nativi, Lorenzo Bigagli, and Ben Domenico. Our intention is to use it as the beginning of a discussion of approaches and actual implementations of XML encodings of netCDF semantics and, in some cases, the data as well. >>
This note focuses on three related approaches to encoding netCDF semantics into XML. Recognizing that there are other approaches (e.g., CSML) and we intend to add sections on them as the information becomes available, we start here with:
A key aspect of web services in general and of OGC interface specifications in particular is the encoding of semantic content in XML. NetCDF datasets include some binary content in the attributes specifications withing the netCDF files themselves. Other semantic content is associate with the data via conventions such as the COARDS (Cooperative Ocean-Atmosphere Research Data) and CF (Climate and Forecasting) conventions. One approach to capturing this semantic information in XML is based on ncML (netCDF Markup Language) which has been augmented with coordinate system information in ncML-CS and ncML-GML. See:
To summarize briefly, ncML-GML contains all the core netCDF semantics adding semantics useful for GIS-based applications, with particular attention to coordinate systems and projection information. Core netCDF semantics is encoded using core ncML grammar, GIS-related sematics is encoded using GML 3.1 grammar, respectively. GML grammar is very precise and complete as far as coordinate systems and projection metadata is concerned. Besides it can be easily extended or profiled. NcML-CS extends core ncML grammar by adding coordinate systems and axis semantics. Such semantics is encoded using structures which are not GML-based. One appraoch to consider is to include these semantics in the next ncML-GML version as well as to adopt GML grammar for encoding them. Presently, ncML-GML ver. 0.6 and higher are based on an "opportunely modified" ncML-CS schema, developed to fix some problems with the parser software.
As a matter of fact, both ncML-GML and ncML-CS documents contain these semantics. The main point is that both languages encode in XML the semantic content which must be -- implicitly or explicitly -- associated with a netCDF dataset. NetCDF to ncML-CS/GML API components are in charge of reading netCDF-CF1 datasets and extract the useful semantics for encoding into XML form -using CS and/or GML grammar. NcML-GML ver. 0.6 and higher API accomplishes this task using a set of encoding rules which attempt either to find useful info in netCDF attributes or to "infer" it from accepted conventions. However, there is a strong argument for keeping netCDF and netCDF-CF as simple as possible, thereby remaining focused on Earth Sciences community datasets and needs. This simplicity can be attained by employing separate, specialized data models which are interoperable each other, rather than one all-encompassing, powerful and general-purpose data model which could easily become too complex and difficult to maintain.
To provide a hands on sense of what these XML dialects look like, a collection of examples of netCDF semantics in different forms is available at:
The datasets themselves are:
For the sst.nc dataset, there are several different encodings of the semantic content:
For the other two test datasets (stiped.nc and RUC.nc), the ncdump, ncML, ncML-CS files are also there, but the ncML-GML encoding is not available yet.
We are almost through a prototype implementation of a WCS capable of dealing with Galeon netCDF-CF datasets, in particular capable of returning ncML-GML encoded datasets. We are proceeding "depth-first", according to a top-down approach: first we implement and test Galeon basic functionalities, than we add support for more complicated testcases. We have focused on the simplest dataset (sst.nc) and we expect to be able to serve it via our server prototype on september, 20th. The next steps will be to broaden our scope, i.e. addressing the other datasets (striped.nc, featuring a slightly more complex data structure, and RUC.nc, featuring a projected coordinate system). So, I'm afraid we can't be of much help regarding projected coordinate system issues, at the moment. However, we have already sketched the basic rules for adding such information, that will be expressed leveraging GML powerful grammar.
Instead, we had to deal from the very beginning with the data encoding issues, as you point out, and we have reached a more mature stage of specification of the viable data access strategies. Summarizing, ncML-GML fully describes a dataset, providing the actual data either via "immediate" or "deferred" access. The latter approach is somewhat less interesting (basically, it is meant for data to be accesssed by mailing an institution, or through generic-purpose protocols - e.g. FTP). The former approach is, in turn, twofold: access may be provided directly (embedded data) or via a purpose-specific automatic protocol (linked data). Currently, ncML-GML APIs and our prototype server supports embedded data encoding (ASCII or Base64 encoding of Java/XDR structures) as well as OPeNDAP data linking, that should be enough for most use cases.