Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

Standards-based Web Services Access to
Existing Atmospheric/Oceanographic Data Systems

Ben Domenico
Draft last updated: August 19, 2007

A primary objective of GALEON Phase 1 was to determine the feasibility of accessing datasets in THREDDS Data Servers via the formal standard interface protocol WCS (Web Coverage Service). In theory that would enable any client that implemented the WCS protocol to access data from THREDDS Data Servers. The main components of the TDS (THREDDS catalogs and ADDE and OPeNDAP) are in widespread use in the FES (Fluid Earth Sciences -- mainly atmospheric science and oceanograrphy) community. The GALEON approach would make those datasets available to a much wider set of client software systems and the associated user communities.


WCS Interface enables access to datasets on THREDDS Data Servers

GALEON Phase 1 showed that the WCS interface for TDS datasets was indeed a practical goal, but several modifications to the WCS 1.0 protocol were incorporated into the WCS 1.1 specification as a result of suggestions that resulted from the GALEON experiments. The result is that WCS clients can access datasets from a TDS as shown in the diagram below. In particular, this diagram shows a real-time TDS like the Unidata motherlode server which is populated with data in real time via the Unidata Internet Data Distribution (IDD) system. This is a significant step forward and we are aware of several groups taking advantage of the capability. In particular, members of the hydrology community are using this interface to access historical data and real-time forecasts related to precipitation. Similar uses were found in the air quality community for forecasts of wind speed and direction as well as precipitation.


Observations and Forecast Model Output Served via WCS from a TDS.

However, GALEON phase 1 brought to light some additional issues. One of the main concerns is that, while WCS proved practical for gridded datasets such as the output of weather forecast models and mosaics created by putting radar data into a regular grid, it is not clear whether WCS is the right protocol for the many other types of data that the FES community works with on a regular basis. In particular, several GALEON participants suggested WCS as a mechanism for serving collections of weather station observations encoded in netCDF via the WCS interface. For clients that already were able to utilize netCDF datasets effectively, this appears to be a viable alternative

Several presentations at the 2006 Fall AGU meeting discussed categories of scientific dataset types. In particular, the following table shows three of the groupings. In fact, "station" observations are common to most geosciences research disciplines. Weather stations gather regular atmospheric observations; buoys do so for the oceans, and gaging stations for hydrology. The table that follows shows the various other dataset categories that 3 different groups (operating more or less independently) have come up with.

General Dataset Categories
from AGU Presentations and RAL “Features Workshop”

 

Unidata CDM
Scientific Data Types


 

BADC
CSML Scientific Features


 

OGC SWE O&M
Sampling Feature Classes


  • Gridded datasets

  • Collections of station observations”

  • Vertical profile and trajectory datasets

  • Swath data from polar orbiting satellites

  • Radial data from ground-based radar stations
  • GRID



  • Profile

  • Ragged section

  • Scanning radar

  • Profile series
  • A Station samples the world at a point,

  • a Profile along a curve,

  • a SurfaceOfInterest on a surface,

  • and a SolidOfInterest in a solid region.

Dataset categorizations noted at 2006 Fall AGU and discussed later at RAL "features workshop:

However, when it comes to the appropriate protocol for accessing these classes of data, things are not so clear. In particular, station observations have traditionally been thought of as being closer in character to GIS "features" and hence might be served via a WFS (Web Feature Server). But the people with clients that already work nicely with the WCS protocol serving datasets binary encoded into netCDF are anxious to use the same protocols and encoding for the other types of data as well.

One idea that came up at the RAL Feature Workshop was to take advantage of the web services chaining to provide alternative interfaces to the underlying datasets. In this case, it could be a very simple example. A site could implement a system that acts as a client to the WCS service, provides a GML wrapper and then serves the result via WFS. In fact, the service could provide both the WFS and WCS option. It could serve the data collection itself in GML or leave it as a binary encoded netCDF coverage, for, as is often pointed out, a coverage is simply a special case of the general GIS feature. The result might look like the following diagram for the case where the ncML-GML dialect of GML is employed. The University of Florence CNR-IMAA group is working with Unidata on the ncML-GML specification.

+
WFS/WCS server acts as client to WCS over TDS server

In this approach, the THREDDS WCS server still confines itself to making real-time forecast and observational data available via WCS as CF-netCDF encoded objects. A second node acts as a client to the THREDDS WCS and performs the necessary tranformations to wrap the CF-netCDF encoded dataset in ncML-GML (netCDF Markup Language - Geography Markup Language) and serves the datasets via both WCS and WFS. In fact the WFS interface could also serve the data as CF-netCDF coverage since a coverage is a special form of a feature. On the other hand this node could also serve the data WFS as traditional point features. Given this combination of services, a much larger set of clients would be able to access the datasets. This would include WCS clients as before, but it would also include WCS clients that expect a GML wrapper for the binary encoded coverages as well as WFS clients that could not have accessed the data from the original THREDDS WCS server.


WFS server with alternate GML wrapper

The diagram above shows the addition of another WFS server that acts as a client to either of the other two servers and provides an alternative GML wrapper. In this case, the GML dialect is the Climate Science Modeling Lanuage (CSML) which would make the datasets available to the community whose clients understand CSML. The British Atmospheric Data Center is working on defining and implementing CSML.


Sensor Observation Service Scenario

Another set of standards-based clients is associated with the SOS (Sensor Observation Service) protocol. For those clients, which serve the Sensor Web Enablement (SWE) community, this scenario allows them to access the same datasets using protocols and encodings they are familiar with. As was noted in the table above, the SWE Observation and Measurement "Sampling Feature Types" have much in common with the "Scientific Feature Types" of CSML and the "Scientfic Feature Types" of the THREDDS Common Data Model (CDM). Given these similarities, it would seem logical that these communities could work to ensure that their systems evolve toward more commonality in their interfaces for dealing with these classes of data. No one group would have to implement everything. Each group could work on their own components within the realm of their primary expertise, but keeping the work of the other groups in mind. In the end the goal would be to minimize the tendency for the protocols and encodings to drift apart.


Many alternative combinations: e.g., direct real-time feeds to servers

The diagram above is there mainly to indicated that there's nothing sacred about the particular way in which the nodes are connnected to one another. There are many options for achieving a wide range of goals with a combination of services such as those depicted. In this instance, the WCS, WFS, and SOS nodes are all obtain their data directly from the sources rather than from other servers. The point is that the chained web services approach allows for a wide variety of service combinations. A number of groups are working on different aspects of SWE. In particular the OGC Ocean Sciences Interoperability Experiment is working to determine how WCS, WFS, and SOS interoperate.


Catalogs for finding the data

One disadvantage of having so many alternative distributed services is that it is difficult for new users to figure out what's available where. Even after finding a dataset of interest, it's crucial that there is enough metadata available that the client can determine how to manipulate and display the data and combine the datasets from different sources into integrated display and analysis tools. Hence it is essential to build cataloging systems which can extract the metadata so that clients can discover the data and determine whether and how they can work with it. This is where the CS-W, Catalog Services for the Web comes into play along with ebRIM (electronic business Registry Information Model).


Client extracting THREDDS metadata for discovery via CS-W

The last diagram simply shows a CS-W node accessing metadata from THREDDS catalogs for use in a standards-based discovery system. George Mason University is working on such a server as part of the ACCESS-Geosciences project funded by NASA. The University of Florence CNR-IMAA group has implemented a client called GiGO..

In the end, it's important to remember that the point here is not that this is THE ARCHITECTURE for how these protocols should interoperate with one another. Rather it illustrates how the web services approach can result in many alternative modes of interoperation while allowing implementors to work in their area of expertise while, at the same time, coordinating their efforts with others to ensure that the end product is useful to the widest possible community of users.

 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Office of Programs University Corporation for Atmospheric Research (UCAR)   Unidata is a member of the UCAR Office of Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690