Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

Data Model Mapping Related to
Encoding of WCS Coverages in CF-NetCDF

Ben Domenico, Stefano Nativi, John Caron, Lorenzo Bigagli, Wenli Yang, John Evans
Last Modified: April 25, 2006

NetCDF Background

NetCDF (Network Common Data Form) is an interface for array-oriented data access and a library that provides an implementation of the interface. The netCDF library also defines a machine-independent format for representing scientific data. Together, the interface, library, and format support the creation, access, and sharing of scientific data. In brief, netCDF is a very general purpose form for data storage and access. Thus it is not surprising that the general netCDF data model is quite different from that of the WCS Coverage which was specifically conceived for geospatial data. But, among its many uses, netCDF is quite commonly and successfully used for geospatial data -- especially data in the realms of atmospheric science and oceanography -- the sciences concerned with the fluid portions of the globe (FES or Fluid Earth Sciences). Many netCDF user communities have developed conventions for the use of netCDF with their types of data. "The FES modeling community is converging on the Climate and Forecast (CF) conventions <http://www.cgd.ucar.edu/cms/eaton/cf-metadata/> in order to define geospatial coordinate systems and other semantics equivalent to the WCS data model.

NetCDF in the WCS Coverage Context

The salient characteristics of the WCS coverage are listed in the first column of the table below with the corresponding general characteristics of the netCDF dataset: the middle column for the generic netCDF and the right column for netCDF conforming to CF (Climate and Forecast) conventions.

WCS Coverage
netCDF dataset (general)
netCDF dataset (CF conventions)
1

At least 2 spatial dimensions,
optional 3rd spatial dimension,
optional time dimension

Arbitrary number of dimensions (i.e.axes). One dimension can be unlimited.

No lower limit on dimensions. Up to 3 spatial dimensions,
one time dimension

2
Coverage range - set of values representing one entity Arbitrary number of scalar variables representing multiple entities Arbitrary number of scalar variables representing multiple entities. A "standard_name" attribute maps variable names to a controlled set of names in the CF conventions name table.
3
Coverage range has single unit of measure Each variables can have a different unit of measure Each variables can have a different unit of measure. Units of measure must conform to CF conventions
4
(x,y,z,t) domain shape The dimensions do not have any pre-specified order, so the shape is arbitrary except that the unlimited dimension must be first. The dimensions do not have any pre-specified order. so the shape is arbitrary. However, CF conventions recommends that dimensions be ordered according to *TZYX (§2.4); also note that a time-series of soundings depending on pressure and location, with time modeled as the unlimited dimension, does not follow this recommendation
5
Explicit geolocation metadata Geolocation optional Implicit geolocation metadata. Must conform to CF conventions.
6
Grid geometry regularly spaced Arbitrary grid geometry Grid geometry can be irregularly spaced

More detailed explanation of table elements by Wenli Yang and John Evans

Row 1: WCS coverage model versus netCDF arrays

1) The basic of WCS coverage model and netCDF arrays

WCS models a coverage as a spatial function that assigns tuples of values to locations in space & time – viz., f(x,y,z,t)={value1, value2, value3, ….}. Each “value” in the range of this function may be a scalar (e.g., brightness between 0 and 255), or an array with one or more dimensions (e.g., brightness values measured at different wavelengths). A scalar-valued coverage can be expressed as f(x,y,z,t)=V1; an array-valued coverage can be expressed as f(x,y,z,t)=V2(w), where w indicates wavelength. A coverage with both kinds of values might be expressed as f(x,y,z,t)={V1, V2(w)}.

It should be noted, however, the above is just a conceptual view, the grid coverage data are still multidimensional arrays with the temporal-spatial dimensions included. Even the single value V1 is also measured at different spatial (x,y,z) and time (t) positions, unless the x, y, z, and t are fixed at just one position.

Thus, the above grid coverage is actually equivalent to two data arrays: a 4-D array: V1(x,y,z,t) and a 5-D array: V2(x,y,z,t,w). These are not different from the netCDF data arrays. The only difference between WCS grid and netCDF array is that WCS distinguishes the spatial/temporal dimensions, (x,y,z,t) from other dimensions (e.g., w).

2) Number of axes (dimensions)

 

Although WCS conceptually requires spatial/temporal dimensions to define the “domain” of a coverage, it does not necessarily mean that the data array, representing the values of the coverage, must have these dimensions explicitly present. For example, values of a variable may be measured at several, say 10, positions along x-dimension but only at one specific position at y-dimension, at specific time and elevation. The data array in this case is actually a 1-D array with 10 elements, but conceptually it can still be considered as 4-dimensional coverage, except that the ranges of the y, z, and t dimensions are all a single value (the x-dimension change from position 1 to 10). Therefore, the required two dimensions (i.e., the x and y dimensions) should not be an issue in comparing a WCS grid and a netCDF array.

 

John Evans comment: Do NetCDFers worry about datatype differences (whereby a scalar value t0 isn't equivalent to the interval [t0,t0], and the region {(x,y) | xmin==xmax==x0} isn't a 1-dimensional line?

 

3) The spatial/temporal axes in WCS coverage and in netCDF

 

Although the number of dimensions itself in WCS coverage and in netCDF does not conflict with from the view point of multi-dimensional arrays, there will be a problem when multiple multi-dimension arrays (each representing one variable) are involved. In WCS, everything included in one coverage must have the same domain, i.e., the same spatial/temporal dimensions. In netCDF, there is no restriction to the dimensions of different variable included in one netCDF file. For example, if a temperature measurement, T, and a pressure measurement, P, are included in a coverage, f(x,y)={T,P}, the (x,y) domain is the same for T and P. That is, the (x,y) in array T(x,y) and array P(x,y) are the same. While in netCDF, if two variables T and P (both 2-D arrays: T(x,y) and P(x,y)) are included in one netCDF file, there is no requirement on whether the (x,y) associated with variable T should or should not be the same as the (x,y) associated with variable P.

 

John Evans comment: As I understand this pragraph, NetCDF files are a looser bundle than a coverage. But a NetCDF file could be equivalent to a group of coverages..?

Row 2: WCS coverage range versus netCDF variables

As mentioned above, the WCS coverage range can contain more than one entities (called measures or attributes or properties or fields or variables or parameters – still under discussion), such as Temperature and Pressure f(x,y)={T,P}, which correspond to different variables in netCDF. There is a name associated with each such WCS range entities. Thus netCDF variables and WCS range entities are equivalent. The biggest issue is that, as mentioned in item 3 of Row 1 above, WCS requires the spatial/temporal domain of all range entities to be the same while in netCDF such requirement does not exist.

Row 3: WCS coverage range unit versus netCDF variable unit

WCS coverage range can have different unit for different entities included in the range. For example, the T and P in f(x,y)={T,P} have different units.

Row 4: The shape of (x,y,z,t) in WCS

WCS does not require that the x, y, z, and t dimensions need to be ordered in any specific order. The (x,y,z,t) can be order in any order corresponding to specific spatial (and temporal) coordinate reference system. For example, the latitude axis may be before or after the longitude axis, depending on the coordinate reference system used.

Row 5: Geolocation

WCS intends to be used for geospatial coverage, the coverage data thus must have certain kind of geolocation information. The spatial part, (x,y,z), of the (x,y,z,t) is used to define the geolocation of a coverage. It can be either explicit or implicit. The explicit way is that the (x,y) (the z-dimension is dropped here just for simplicity) is some kind of geo-coordinate system, such as latitude/longitude. The implicit way is that the (x,y) is some non-geo coordinate such as row/column but is associated with geo-coordinate. Additional information is needed to tell how the non-geo coordinate, row/column, is associated with geo coordinate so that geolocating can be done. In a recent WCS design document written by Arliss Whiteside, a non-georectified and non-geo-referenced grid use case is presented. In such a non-georectified and non-geo-referenced grid, there is no geo-referencing information and thus request of grid data can only be done by grid, or image, coordinate. But for geospatial application, any data can eventually be associated with some kind of geo-information.

Row 6: Grid geometry

The WCS grid geometry can be irregularly spaced. The orbital swath data is one of the examples.

Additional Spatial and Temporal Dimensions

Besides the characteristics shown in the table, the FES community often employs specialized coordinate systems for specific datasets. For example, the vertical spatial dimension can be pressure in the atmosphere or density in the ocean. Or there may be a such a vertical dimension in addition to vertical height; likewise, additional, a forcast-time axis may be used in addition to a real time axis. Consequently, the earth scientist thinks in a 4 or 5-dimensional space and requires a 4 or 5-dimensional data model or an arbitrary n-dimensional data model. Each dimension is an independent scalar variable; we can think of a netCDF gridded-based datasets as a 4-D, 5-D (or n-D) hypercube

Dimension space geometry can be implicit (e.g. grids) or explicit (e.g. multi-points); semi-implicit geometry is possible as well, such as irregular grid geometries, i.e., geometries where the distance among adjacent points along one or more of the grid axes is not constant. The co-dimensions (i.e. range set) can be numerous, defined on all the independent variables or on a subset of them. Each co-dimension is a dependent scalar variable. NetCDF data model was designed to encode such datasets in a straightforward and flexible way. Indeed, a netCDF dataset can contain different 3/4/5 D coordinate systems, which can share common axes.

Additional Detail on Spatial and Temporal Dimensions by Wenli Yang

2. Additional spatial and temporal dimensions

WCS coverage has only one temporal dimension and three spatial dimensions. Additional height and time dimensions in FES data may need to be treated as other axes in coverage. For example, a 5-D data, P, with two time dimensions, t1, and t2, and two height dimensions, z1, and z2, may be announced in a WCS server as to have a (x,y,z1,t1) domain and two axes, z2, and t2, which represent the additional height and time dimensions: f(x,y,z1,t1)={P(z2,t2)}. The data array itself is a 5-D array P(x,y,z1,z2,t1,t2) although it is conceptually considered as a WCS grid having two axes and being defined in a 3-D spatial/temporal domain. A client will be able to do subset of such a coverage (5-D array) with appropriate spatial/temporal and axis subset specified. The server can decide which time and height dimension should be treated as WCS domain dimensions and which should be treated as WCS axes.

3. Co-dimensions

As mentioned in comment on Row 1 (3) above, more than one variables can be contained in a WCS coverage and different variable can share, all or partial, dimensions. The difference is that the dimensions treated as domain dimensions in WCS coverage (i.e., the x,y,z,t dimensions) must be the same for all variables while this is not required in netCDF, in which different variable can have different coordinate reference systems.

Metadata Mapping

Much of the metadata for netCDF datasets is implicit in the conventions. The information in the netCDF itself combined with that implied in the CF conventions can be mapped into a corresponding set of GIS coverage concepts:

Additional Detail on Metadata Mapping by Wenli Yang and John Evans

1) Variables versus range

As mentioned above, a WCS coverage can have one or more variables with different units (the term, variable or some other term, has not been finalized). A coverage has one domain, (x,y,z,t), and one range, {variable(s)}. The latter can include more than one variables.

2) Domain geometry

Since WCS is designed for grid coverage, regular grid and irregular grid may be the geometry should be considered. Point data should be served in Web Feature Service (WFS).

John Evan's comment: At the moment, yes. But in the future, I hope WCS can do point-based coverages as well. (Compared with WFS, it would provide a different view of those features, not as unconnected points, but as representing a spatially-varying phenomenon.)

3) Scalar versus compound values in the range-set

A scalar value in a range set denotes a single value associated to each location in the domain, such as f(x,y,z,t)={T}. The corresponding NetCDF variable has at most 4 dimensions, T(x,y,z,t).

A compound value in a rangeset denotes a set of values associated with each location in the domain, such as radiances, R(w), measured at different wavelength, w. Thus the coverage is expressed as f(x,y,z,t)={R(w)}. The corresponding NetCDF data array has 5-dimensions: R(x,y,z,t,w).

NetCDF as WCS Direct Binary Encoding Form

In the direct usage of netCDF as a WCS binary encoding format, the server, in response to the getCoverage request simply returns a binary netCDF file object. [Lorenzo: I would recommend that the subsets of netCDF encoded coverages returned by a WCS be consistent netCDF files themselves (not just data chunks); a server may prune also unneeded dimensions, coordinate variables, etc.] NetCDF encoded coverages returned by a WCS must be valid netCDF files (not just data chunks). The server must return the requested coverage(s) as a netCDF variable and its associated coordinate variables. If the WCS request asks for a subset, the variable and its coordinates must be appropriately subsetted. The server should remove unneeded dimensions, variables, etc. not needed by the requested coverages. The exact encoding must be documented and published as a NetCDF Convention; we recommend the CF-1.0 Convention, or variants of it as recommended by the CF working groups. [John: The use of OPeNDAP URLs is a nice way to do it also. I'll have to look at that more, perhaps try to support it in our server. OpenDAP Grids are a natural way to do this, but we are not currently doing grids. Stefano, do you use Grids?] In essence this means that the client can write this object to disk and then manipulate it using the netCDF interface. This is the approach used on the experimental Unidata WCS server: The NetCDF file has the coordinate system information encoded in a standard way. The Netcdf libraries are currently being developed to automatically extract coordinate system information through a standard API.

http://motherlode.ucar.edu:8080/thredds/docs/WCS/index.html

Additional Detail on Direct Binary Encoding by Wenli Yang

It should not difficult for a WCS server to generate a return coverage encoded in a valid netCDF file specification, including subset along dimensions. The issue is how to follow the convention used in by the user community. I think that the start point is to understand conceptual models in WCS coverage and in netCDF and understand the relationship between the WCS grid and netCDF multi-dimensional arrays (in fact, WCS coverage is entirely conceptual and does not prescribe the physical grid data array. But the grid data are normally stored as multidimensional arrays as in netCDF).

The most difficult part probably includes two aspects. First, how the spatial/temporal dimensions in netCDF can be completely represented in WCS? NetCDF is very flexible regarding to these dimensions while OGC has clearly defined Coordinate Reference System identifiers. A set of commonly used spatial/temporal dimensions from netCDF should be identified and be put into OGC recognized identifier set. This may also involve how to treat the additional time and height dimensions, and other coordinates like atmospheric coordinate. Second, how to identify a subset of netCDF, with clear, specific, and commonly accepted convention, which is general enough to represent the most netCDF user communities yet is simple (or easy) enough to be quickly started for discussion/implementation.

I think that we can identify a few netCDF files that are commonly used in the netCDF communities and show what are included in these files, such as coordinate reference systems, geolocation methods, naming conventions, metadata, and common operations on those files (e.g., subsetting along certain dimensions). By comparing these with the WCS coverage model, the current WCS interface, and the available WCS resources (e.g., interpolation methods, subsetting methods, coordinate system identifiers), we will be able to identify what can be easily done by WCS and what is difficult. Thus, we will not only be able to specify a netCDF profile for WCS but also be able to provide inputs to the WCS specification revision.

NetCDF Encoded in ncML-GML with Binary Data via OPeNDAP

An alternative the server response to at getCoverage request is an ncML-GML document which includes an OPeNDAP URL pointer to the binary encoded data which can be retrieved via the OPeNDAP protocol. This approach is used on the University of Florence WCS-G server:

http://athena.pin.unifi.it:8080/galeon/

NetCDF Fully Encoded in ncML-GML

The University of Florence WCS-G server has an alternative option for requesting the data fully encoded into ncML-GML. In this case the entire requested dataset is returned as an ncML-GML document, so there is no need for a binary encoding. However, for large netCDF datasets, this alternative may be impractical.

<<<Need a reference to the latest version of the ncML-GML documentation>>>

Leveraging SOAP technology, the University of Florence WCS-G server implemented a third option for responding to a getCoverage SOAP request: the requested dataset metadata are returned as an ncML-GML document contained in the SOAP response body, while the corresponding binary netCDF file object is included as a SOAP attachment.

http://athena.pin.unifi.it:8080/galeon/

References