|
|
|||
|
||||
Ben Domenico attempting to creat a web page from the
AGU Fall 2006 Presentation* of
Andrew Woolf and his colleagues
with other input from the presentations of Simon Cox and Mike Botts
and from the previous work of John Caron and Stefano Nativi
Last updated: January 2, 2007
* ‘Feature types’ as an integration bridge in the climate sciences by Andrew Woolf (1,*), Bryan Lawrence (2), Jeremy Tandy (3), Keiran Millard (4), Dominic Lowe (2), Sam Pepler (2) of (1) CCLRC e-Science Centre, (2) British Atmospheric Data Centre,(3) Met Office, (4) HR Wallingford (*) Corresponding author email: A.Woolf@rl.ac.uk
Last modified: December 22, 2006
At the Fall 2006 AGU meeting, Andrew Woolf presented ‘Feature types’ as an integration bridge in the climate sciences*. Andrew's presentation can be viewed as a top-down view of a problem we have been approached from the bottom up in THREDDS and to some extent in GALEON. One might also say that end users and client developers look at data in terms of 'feature types' whereas data providers are confronted with 'collections of files."
In point of fact, the GALEON project is stuck right in the middle of these two "data world views." If we are to design and implement interfaces that effectively and intuitively connect user's client applications with the collections of files that exist at data provider sites, it is important to have a clear understanding of both complementary perspectives.
So what I plan to do here is take the information and diagrams from Andrew's presentation and take an initial stab at relating it to our effort in GALEON to provide standard interfaces to the vast collections of data available on THREDDS Data Servers (TDS). (Note that I am using the TDS acronym to include THREDDS catalog services as well as HTTP, GRIDFTP, OPeNDAP, ADDE, etc. data services.) Andrew and others will be left with the task of helping me "make it right" by correcting the errors, omissions, and so forth.
Andrew casts the problem in terms of data access systems that have been devised by data providers. For them it is natural to view the problem in terms of storage. But storage-centred data management focuses on the container, not the content. End users of course are far more interested in the content. As a result of the focus on the container, however, we end up with:
Andrew give the British Atmospheric Data Centre (BADC, http://badc.nerc.ac.uk) as an example. BADC is:
Sites using the Unidata TDS serve a similar mix of datasets. A list of several TDS sites can be found in the "top level" catalog
http://motherlode.ucar.edu:8080/thredds/topcatalog.html
on the motherlode server which is the primary developmental system for THREDDS. Two of these sites are maintained at Unidata:
At the BADC site as well as at the TDS sites, the user can browse the hierarchy of catalogs of collections and drill down (perhaps through several levels of catalogs) to the point where one eventually is presented what amounts to an inventory list of files available in a a particular catalog. An example from the BADC site is

and from the Unidata motherlode server:

The process of getting there brings to mind the process of browsing through layers of directories or folders on your local computer disk because that effectively what you are doing. You're just drilling down through the remote server's file system until you finally get to containers that actually have the data in them. In both screenshots above, one can see the tree structure of the server file system in the BADC "current directory line" and embedded in the TDS Catalog URL.
In fact, this is not entirely a bad approach. For experts on the Airborn Antarctic Ozone Experiment, this is an intuitive and natural means for getting at data of interest and choosing among er870812, er870814, and er870815 may be the natural way to get the needed data. Similarly the expert on the NCEP GFX Global One-degree forecast model knows exactly which item in the TDS list to select.
For the researchers who are among the cogniscenti in the field and work with the collections of data on a daily basis, the containter-centered stovepipe is just fine. In fact, there will undoubtedly be objections if that interface is hidden.
On the other hand, what about the hydrologist who is not yet familiar with the wide variety of weather forecasts that are available. And furthermore the hydrologist uses traditional GIS tools on a daily basis and the GIS tools typically are built on relational database systems. That researcher might want want to formulate a request along the lines of: "Which weather forecasts have predicted more than 7 inches of rain over any given 10 hour period in the Houston area?" In this case, the container/file system approach is not very helpful.
Of course the hydrology community has their own set of stovepipe data systems that might be just as inscrutable and impenetrable to the experts on airborne ozone experiments and NCEP forecast model runs.
The challenge is not to break down or completely revamp either or both stovepipes; many people are being served reasonabley well within those expert communities. Nor is it to convert all the data in each system into some sort of equivalent in the other system. Rather, we need to punch holes in the stovepipes and provide a conduit between them that allows each group to access data of interest from the other realms. But, as Andrew phrases it: "The current way of doing things makes it hard to integrate data from other data repositories … or other datasets…or even data from within the same dataset sometimes!"
Given that the domain specific "stovepipe" data systems exist and serve a useful purpose, the interoperability issue is to design the most effective means of bridging the gap between these community data systems. In other words, how do we construct the conduit between the stovepipes.
Andrew and his team have proposed that the connection be made by working with a set of "feature types" which capture a higher level of sematics which are closer to the concepts employed by end users in their applications. Some examples are illustrated below:





These feature types bear a striking resemblance to the "scientific data types" proposed by John Caron in his description of the Common Data (Access) Model at http://www.unidata.ucar.edu/projects/THREDDS/CDM/CDM-TDS.htm. John proposes the following scientific data types:
Note: John Caron, Ethan Davis, and Robb Kambic have drafted a document that describes a Convention for NetCDF (version 3) files for writing Point, Trajectory, and Station observation data: Unidata Observation Dataset Conventions at http://www.unidata.ucar.edu/software/netcdf-java/formats/UnidataObsConvention.html
The important thing to focus on here is the similarities between the "feature types" and the "scientific data types." They are at the same level of semantic abstraction. The feature types were arrived at by working top down from highly abstract conceptual models of features developed in the international standards community whereas the scientific data types were developed via a bottom up process that evolved from an effort to establish a "common data access model" from the myriad of low level binary encoding formats and access protocols of the netCDF, OPeNDAP, HDF, ADDE and THREDDS communities. Besides listing the scientfic data types and associated methods, the Caron article Common Data (Access) Model (CDM) also explains how the CDM was derived from the best characteristics of the netCDF, HDF, and OPeNDAP encoding format models.
The next section taken from the AGU presentation of Andrew's team, shows how their feature types are derived within the abstract conceptual framework of the ISO standards specifications.
A number of groups in the international standards community have thought about these questions at a very abstract (domain independent) level. Andrew and his team suggest starting with this work. In particular there is a suite of emerging ISO standards referred to as TC211. TC211 actually includes around 40 standards for geographic information. These cover the entire data systems activity spectrum from discovery to access to use.
Among those standards is ISO 19101 which considers geographic ‘features’ an “abstraction of real world phenomena.” Such features can be abstract types or specific instances. The key is that they "encapsulate important semantics in the universe of discourse."

From ISO 19109 “Geographic information – Rules for Application Schema”
As the diagram indicates, at the higher level, there are a set of abstract feature types which enable us to discuss categories of objects that form the content of our data systems. Below that is an application schema that describes the content of the data in terms of a conceptual schema language. The actual data are then referenced in terms of the logical structure defined in the application schema.
In other words, the application schema defines the semantic content and logical structure of datasets. ISO standards in turm provide the toolkit that deals with:
Then the Geography Markup Language (GML) provides an standard XML grammar for canonical encoding of the in a form that is independent of the storage form.
The central concept is that data are to be delivered via a set of services between systems which can remain loosely coupled so they can continue as productive stovepipes within their own communities. The UML diagram below illustrates the main components of the overall architecture.

As the diagram shows, a geospatial dataset consists of features and a set of related object. These are described by metadata and delivered via a geographic information service in a logical structure defined by the application schema.
The focus on abstract feature types loosens coupling between storage artefacts (content) and the data management infrastructure used to store than content. In practice, this approach:
This has substantial advantages not only for accessing data between existing data systems at any given point in time, but it insulates the end user from changes in the underlying data storage system as new storage system technologies supplant obsolescent ones. As has been observed in (CEN/TR 15449), “the lifetime of a technical implementation is shorter than the lifetime of the information it handles.” CEN/TR 15449 is the European Committee for Standardization Technical Report 15449 "Geographic Information - Standards, specifications, technical reports and guidelines, required to implement Spatial Data Infrastructure".
Whether one refers to abstract "feature types" or "scientific data types," (perhaps we could use the term geoscientific data feature types" to encompass them), it seems clear that similar concepts are evolving for the data objects that pass through the conduit between disciplinary stovepipe data systems. But there is considerable work to be done in making the connection between these feature types and the underlying storage mechanisms within each stovepipe data system. Much of that effort will be in developing a set of practical models at the encoding level. These complement the high level semantic feature models.

Standard Web Service Protocols with Geoscientific Feature Types
as a Conduit Between Stovepipe Data Systems
An example of the encoding level model is the mapping between the CDM model and the ISO 19123 model. This is described in a presentaion by Stefano Nativi, part of which is found in: Unidata's Common Data Model
Mapping to
the ISO 19123 Data Model
http://www.unidata.ucar.edu/projects/THREDDS/GALEON/Reports/CDM-ISO/CDM-ISO-DataModels.htm
This article describes the mapping between the low level CDM augmented with the CF conventions and the Discrete Grid Point Coverage model of the ISO 19123 standard. In the standards world, coverages are generally used to represented gridded data, so this specific mapping would fall under the high level Grid geoscientific feature type. In fact a coverage can be thought of as a special type of feature that is particularly suitable for gridded data. It is natural to think that data that falls into this category would be accessed via the Web Coverage Service (WCS) interface. This representation works well for satellite imagery and the output of weather and climate forecast models where there is a systematic, well-specified (if somewhat complicated) relationship among the data points in the coverage.
At the other end of the spectrum of geoscientific feature types is the Point type. Examples of the data in this category are collections of data that are the result of observations at weather stations, ocean buoys, and river gauging stations. In this case, the relationships among the observing points are not so clear. For weather stations, they are pretty much random in terms of geography. Many bouys actually move in time. River stations are related in the sense that they lie along a river, but that is not the same sort of mathematical relationship that exists in the case of the points in a satellite image or forecast model output. Should these be represeted as coverages with very irregular spacing or are they one of the more traditional feature types that are accessed via the Web Feature Services (WFS) interface? At the applications level, it doesn't matter so much because they are clearly a different geoscientific feature type. But, at the encoding and service interface level, it is important because the data have to be encoded in a specified form (simple features in GML or irregularly-spaced CF-netCDF coverages) and they have to be accessed by a specified service: WFS or WCS).
Station data are a specific form of point data. In particular, they are collections of observations from a set of similar observing stations. Examples are the data from weather observing stations or river gauging stations or fixed ocean buoys. Data from such observing stations are often collected at a central location, then made available in "bins." For example weather observations are often collected and stored in files that contain all the observations for a given hour. River stations may only be colleced on a daily basis. One assumption is that the observing stations do not move spatially or that they move infrequently enough that the location information can be stored in a different file rather than with each individual observation for each station. This can cause problems when dealing with retrospective data because one needs not only the observations but also the table indicating where the observing stations were located at the time.
These include observations from platforms like weather balloons, drifting buoys as well as ships and aircraft equipped with observing platorms. In each case, the observing platforms move spatially from one observing time to the next. Thus the coordinates of the changing spatial location have to be provided along with the observational data and the time of the observation. As with station data, trajectory data are a time sequence of observations, but they have a specific order in both space and time associated with them.
In one sense, swath data of the sort gathered by polar orbiting satellites can be viewed as a special form of trajectory data. However, for a swath, the location of each observation point can be represented by a set of "navigation" algorithms that describe the satellite orbiting path and the scanning pattern of the onboard remote sensing instrument. The practical difference is that the location of each measurement is determined by an algorithmic calculation rather than being stored with the data.
As with the swath, radial data are a special case of a trajectory, but in this case, the location of the measurements is determined by the scanning pattern of the radar. Here again, the locations are typically calculated according to an algorithm rather than stored with each individual observation.
This category includes observations from geostationary satellites and the output of numerical forecast models. It is also used for representing fields that have been calculated from other data types (e.g., point data or the radials from individual radars) by an objective analysis or observational data assimilation scheme. One can think if the latter as a sophisticated method of interpolating point data onto a grid. In this view, a numerical forecast model is simply an extrapolation of the time dimension into the future.
<<< Is this a special form of a vertical cross section? If so, is it really a distinct feature type? Need to check with Andrew.>>>
Nearly all the feature types described above can also appear as time series. The trajectory/profile series feature shown in an earlier section illustrates one form of such a series. But sequences of point observations, GOES images, forecast times, and radar scans can also be described in terms of series. And as with vertical cross sections, it can be useful to take sections with time as one of the axes. Time vs. longitude views provide insight into phenomena such as El Nino. The Hovmoller diagram below illustrates this special cross section.

Hovmoller Diagram of Sea Surface Temperature (Time vs. Longitude)
from http://serc.carleton.edu/images/introgeo/teachingwdata/EnsoImage.gif
It might be worth considering in GALEON Phase 2 that we specifically try to bring in new geoscientific data types. In point of fact, Phase 1 focused on Grid types because one of the objectives was to determine whether the WCS 1.0 specification was adequate to represet the "5D" data contained in the output of weather and climate forecast model output. But the GMU WCS server has a large collection of Swath data from polar orbiting satellites. GMU is also likely to be one of the first sites to have a working version of WCS 1.1 for testing in Phase 2. Other GALEON participants have suggested that access to Point and Station datasets is important in many fields. Point data are a special challenge because the question arises as to whether such datasets should be served via the WFS or the WCS. To complicate the picture even further, the Sensor Web Enablement effort has a strong point data element in that the most basic of sensors usually serve sequences of observations from one point in space. And, from the point of view of a really practical use case where there are huge gains to be made via interoperable data systems, the hydrology community has a strong interest in radar data. Hydrology has traditionally used GIS tools based that incorporated relational database technology on the storage end rather than file systems. Hence, there might be a strong motivation to work on data access methods for Level II radar datasets that fall into the Radial Geoscientific Data Feature Type. The full volume scan datasets output from Level II radars present a geometry that is nearly as complicated as that of the Swath datasets from polar orbitiing satellites.
For each of these Geoscientific Data Feature Types, there would be a challenge to determine:
This is really shooting from the hip, but there is an aspect of SWE that can be viewed as an integration layer that insulates applications from the specifics of the underlying encoding forms and access services. Simon Cox's diagrams can be used here.
| Contact Us Site Map Search Terms and Conditions Privacy Policy Participation Policy | ||||||
|
||||||