Projects and Technologies Related to Unidata's THREDDS
(THematic Real-time Environmental Distributed Data Services)
Catalog and Data Services

draft by Ben Domenico
last modified: March 14, 2007

Overview

An excellent one-page factsheet describing THREDDS is available at:
http://www.unidata.ucar.edu/publications/factsheets/2007AMSsheets/threddsFactSheet-1.doc

Because many institutions are developing technologies related to providing access to georeferenced environmental datasets via the internet, it is not surprising that many of these initiatives are related in some way to THREDDS which started out as an package for providing catalogs of datasets available via client/server protocols such as OPeNDAP. This document describes the THREDDS-related technologies and projects at a very high level and includes pointers to more detailed information about each.

THREDDS-related Technologies and Projects

The concise descriptions below include a brief statement indicating the relationship(s) to THREDDS.

THREDDS (Catalog and) Data Server, TDS

From its beginnings as a package for providing catalogs of datasets available via client/server protocols such as OPeNDAP, THREDDS has evolved to the point where the THREDDS Data Server (TDS) serves datasets via a number of different protocols. TDS also automatically generates a hierarchical set of catalogs ranging from simple inventory lists with descriptions of data files to a complete hierarchy of classes and collections of data. The TDS can deliver subsets of files on remote servers in forms different from those in which the data are stored on the remote servers. More recently, the TDS has an aggregation capability which enables it to deliver "virtual datasets" which do not exist anywhere on server disks but are created "on the fly" at the time they are requested. The catalogs can reference datasets on many different, distributed services. THREDDS catalogs provide the framework for storing descriptions of the available datasets. These descriptions can range from simple, general overviews of collections of classes of data to full-blown, standards-conforming metadata.

OPeNDAP

One way to think of OPeNDAP is as a protocol and software package that allows data analysis and display clients to access remote datasets as if they were on the disks of the machine on which the client software is running. Originally called DODS (Distributed Oceanographic Data System), OPeNDAP libraries allowed users of clients like IDL, Matlab, and the Unidata IDV to specify a WWW URL instead of a local file name an the client would retrieve the data from a remote server directly into the client software system. The original idea for THREDDS sprang from OPeNDAP user needs for a means of finding collections of datasets available via the OPeNDAP clients.

THREDDS Connection: So the initial goal of THREDDS was to make it possible for data providers to create catalogs of datasets available on their servers.

GALEON/WCS

WCS (Web Coverage Service) is a specification of the Open Geospatial Consortium (OGC) that defines a standard protocol for client/server data access over the internet. It is part of a suite of such protocols (including WMS for Web Map Services, WFS for Web Feature Services, and a set of specifications associated with Sensor Web Enablement). Of the OGC data services protocols, the WCS comes closest to serving the types of gridded datasets found on many OPeNDAP service and cataloged in THREDDS catalogs. The OGC GALEON (Geo-interface for Air Land Environmental and Oceans NetCDF) Interoperability Experiment is an effort to determine whether the WCS interface specification is adequate for serving datasets of the sort that are common in OPeNDAP/THREDDS servers employed by many data providers in the Atmospheric and Oceanographic science communities. GALEON Phase 1 showed that WCS is adequate for such service, but many changes to the specification were needed to improve the service of these datasets. Those changes have been incorporated (along with many others) into the WCS 1.1 specification. GALEON Phase 2 will experiment with the new, improved WCS 1.1 to determine how well the modifications serve the purpose.

THREDDS Connection: From the point of view of the community involved in the THREDDS Data Server, WCS provides a standard protocol for accessing datasets available on the hundreds of THREDDS/OPeNDAP servers already deployed in the community.

NASA/GMU ACCESS Geosciences Catalogs/CSW

Just as the GALEON project attempts to provide access to datasets on THREDDS servers via standards-based protocols (WCS), the objective of this project is to make the information in THREDDS catalogs available via an OGC standard protocol, the CSW (Catalog Service for the Web).

The ACCESS project addresses the interoperability of two data system infrastructures that are widely used by different segments of the Earth science research and applications community., i.e., 1) the Earth science community using a family of geoscience protocols, including OPeNDAP, netCDF/http,  and the thematic Real-time Environmental Distributed Data Services (THREDDS) data catalog in accessing Earth science data collections; and 2) the geospatial community using Open Geospatial Consortium (OGC) interoperable protocols to access geospatial data collections.  The approach of the project is to develop a gateway that allows a user of a client component based on one of the infrastructures to have direct access to the collections of a data provider employing a server based on the other.   

The OGC CSW specifies the interfaces, bindings, and a framework for defining application profiles required to publish and access digital catalogues for geospatial data and services.  The CSW specification does not require the use of a specific catalogue schema.  However, it encourages the adoption of  standard schemas for maximum interoperability.  Specifically, OGC developed two application profiles for CSW: the ISO19115/19119 profiles and the ebRIM profile.  The ISO19115/19119 profile explains how catalogue services based on the profile are organized and implemented for the discovery and management of geospatial data and service metadata which are compliant with the ISO19115 and 19119 standards.  The ebRIM profile explains how services based on the more general OASIS ebXML Registry Information Model are organized and implemented.   

There are several approaches in connecting THREDDS to CSW services.  One of them is to include the metadata content of THREDDS in the ISO profile by establish a mapping relationship between THREDDS metadata and ISO metadata.  It should be noted that only the content, or semantics, rather than the syntax of the THREDDS metadata can be mapped to ISO except when an exact equivalent, both semantically and syntactically, between a THREDDS metadata and an ISO metadata item can be found.  When such mapping relationship is established, the metadata information in a THREDDS server can be converted to its ISO equivalent and be available to CSW clients compliant with the CSW ISO profile.  THREDDS catalogue can also be linked to CSW services by implement the THREDDS information model directly in a CSW server through using the CSW ebRIM profile.  ebRIM is a general information model that defines what types of objects are stored in the registry and the relationships among the stored object.  It can be extended to include the THREDDS metadata information.   There are several ways to extend the core ebRIM to include geospatial information.  For example, two methods were simultaneously used in the GMU CSW server to include geospatial metadata information in its CSW, one was deriving new metadata classes from existing ebRIM classes and the other was using Slots to extend an existing class. 

GMU has previously implemented a CSW server with ISO19115/19119 information models but, for historical reason, it was implemented based on the ebRIM profile (note that the ISO models can either be implemented using the ISO profile or the ebRIM profile).  GMU recently re-implemented the ISO models directly based on the ISO application profile.  Currently, in the NASA/GMU ACCESS Geoscience/Catalogs project, GMU is implementing a CSW that will provide THREDDS catalogue metadata to the OGC communities.  As previously mention, THREDDS catalogue can be provided through CSW either using the ISO profile or the ebRIM profile.  During earlier project design document reviews, experts from both Geoscience and OGC communities suggested that a direct mapping from THREDDS to ISO and a ISO CSW would be more interoperable.  Therefore, the project team decided that a ISO profile compliant CSW with THREDDS metadata will be implemented.   The THREDDS to ISO mapping has been completed and the implementation has been started. 

 Another difference between THREDDS and CSW is the query results.  THREDDS provides direct browsing of the catalogue.  When a client accesses a root THREDDS catalogue, the server will provide information of the immediate children of the root catalogue.  These immediate children can either be a direct data set, or a collection of data sets, or reference to another catalogue.  The client can recursively browse the entire catalogue and view the hierarchy of datasets in the catalog.  CSW accepts client queries based on specific values for queryable attributes such as geospatial bounding box and subject keyword.  The server will provide the client the resultant data sets that meet the query conditions.  CSW specification and the two application profiles (ISO and ebRIM) do not specify how data hierarchy information should be presented to clients.  It is possible, however, for a CSW server to provide hierarchical information to its clients.  For example, a CSW server may include a queryable property named “ParentID”.  A client can then send a request specifying “ParentID=OceanTemperatureData” and ask the server to return all datasets whose ParentID is OceanTemperatureData.   Such implementation, however, seems not feasible in the CSW ISO profile because ISO19115 does not include metadata that can be used to tell parent/child relationship among datasets (although it does include data scope level information indicating if a data set is a simple/direct dataset or a series/collection of datasets).

GI-Go

THREDDS Connection: In the context of this document, GI-go is a client that provides access to a wide variety of distributed data catalogs and inventory lists, including THREDDS and those of the OGC.

"GI-go is a multi-platform solution developed by IMAA-CNR and University of Florence at Prato for geospatial data discovery and access (according to server capabilities) across distributed and heterogeneous data sources. With this Java Web Start-powered and friendly tool, user is able to access a federation of disparate servers through a uniform view based on a profile of ISO 19115 metadata standard.
GI-go supports user in discovering and browsing available datasets, retrieving and evaluating their description information and performing distributed queries according to any combination of the following criteria: geographic area, temporal interval, topic of interest and data source (i.e. where, when, what, who)."

netCDF

"NetCDF is an interface for array-oriented data access and a library that provides an implementation of the interface. The netCDF library also defines a machine-independent format for representing scientific data. Together, the interface, library, and format support the creation, access, and sharing of scientific data."

THREDDS Connection: In some ways, the netCDF (Network Common Data Form) is the heart of all this work. One of the original goals of OPeNDAP was to enable access to remote data files via clients using the netCDF interface on local machines. In many cases, the metadata for netCDF files are "self-documenting" in that they carry along much of the metadata along with them internally. THREDDS catalogs allow access to some of that metadata and to many datasets in netCDF form -- even though the data are stored in other forms on the server disks. The Common Data Model combines the best elements of the netCDF, HDF5, and OPeNDAP data models while maintaining as much as possible the simple netCDF interface. In conjunction with the CF conventions, the netCDF has been incorporated into the WCS 1.1 specification as one of the binary encoding formats.

HDF

" At its lowest level, HDF is a physical file format for storing scientific data. At its highest level, HDF is a collection of utilities and applications for manipulating, viewing, and analyzing data in HDF files. Between these levels, HDF is a software library that provides high-level APIs and a low-level data interface. "

THREDDS Connection: In version 4.0 the netCDF API will be extended and implemented on top of the HDF5 data format. NetCDF users will be able to create HDF5 files with benefits not available with the netCDF format, such as much larger files and multiple unlimited dimensions. Backward compatibility in accessing old netCDF files will be supported. The combined library will preserve the desirable common characteristics of netCDF and HDF5 while taking advantage of their separate strengths: the widespread use and simplicity of netCDF and the generality and performance of HDF5.

ADDE

"OpenADDE is a free software package used to make satellite and NEXRAD data (in the supported formats) available to users at remote sites using visualization packages that support the ADDE client/server protocol, e.g., McIDAS-X, McIDAS-Lite, VisAD, IDV, MATLAB and IDL."

THREDDS Connection: Many users in the Unidata/THREDDS community use the IDV to access datasets on distributed servers via the ADDE protocol. ADDE-type access is being built into the THREDDS Data Server.

Common Data Model

" The Common Data Model is a unification of the data models of OpenDAP, netCDF, and HDF5."

The CDM is a core component of the THREDDS Data Server. Datasets which can be mapped into the CDM can be served directly by the TDS>

CF (and COARDS) Conventions for netCDF

"The CF conventions for climate and forecast metadata are designed to promote the processing and sharing of files created with the NetCDF API The conventions define metadata that provide a definitive description of what the data in each variable represents, and of the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities. The CF conventions generalize and extend the COARDS conventions."

Live Access Server (LAS)

"The Live Access Server (LAS) is a highly configurable Web server designed to provide flexible access to geo-referenced scientific data. It can present distributed data sets as a unified virtual data base through the use of DODS networking."

THREDDS Connection: THREDDS catalog generation is part of the LAS distribution, so any data provider site that implements the LAS with the default settings will automatically generate and server THREDDS catalogs for their holdings. It is estimated that there are several hundred such sites in operation.

GO-ESSP

"The Global Organization for Earth System Science Portal ( GO-ESSP ) is a collaboration designed to develop a new generation of software infrastructure that will provide distributed access to observed and simulated data from the climate and weather communities. GO-ESSP will achieve this goal by developing individual software components and by building a federation of frameworks that can work together using agreed-upon standards. The GO-ESSP portal frameworks will provide efficient mechanisms for data discovery, access, and analysis of the data."

THREDDS Connection: Among the GO-ESSP list of software packages are: LAS - Live Access Server, OPeNDap - Open-Source Project for a Network Data Access Protocol, and THREDDS - Thematic Realtime Environmental Data Distributed Services

Community Data Portal at NCAR

THREDDS Connection: The NCAR CDP employs most of the THREDDS related tools described in this article to provide WWW portal type access to a large collection of community datasets. A more typical hierarchical browse interface is available, but there is also a keyword search interface that provides access to the metadata in the catalogs via a more google-like text query.

"The Community Data Portal (CDP) is a collection of earth science datasets from NCAR, UCAR, UOP, and participating organizations in the following research areas: oceanic, atmospheric, space weather,turbulence."

MMI

The Marine Metadata Interoperability Project has multiple objectives. One is to work with the Marine community to create the metadata content for their data collections. A second goal though is to develop the technological framework needed to make that metadata available to and understandable by researchers who may not be experts in the specific field represented by those who collected the data. According to the web site, they are providing tools to facilitate working with metadata, workshops to show some approaches in action, ontologies where none existed, and prototype software showing metadata used in an interoperable framework.

THREDDS Connection: Among those tools are a set that assist in the development and use of ontologies related to descriptions of data collections. As they are developed, these ontologies will provide a mechanism for automated mapping among the specialized terms used in various disciplines and housed in facilities like THREDDS and CSW catalogs.

SWEET

"SWEET provides a common semantic framework for various Earth science initiatives. The semantic web is a transformation of the existing web that will enable software programs, applications, and agents to find meaning and understanding on web pages.  SWEET developed these capabilities in the context of finding and using Earth science data and information."

Among the SWEET capabilities is a conversion table for mapping CF-netCDF standard names into SWEET ontologies.

IOOS/DMAC

ESIP Federation Service Collaboration Demos

ESRI ArcGIS 9.2 netCDF Interface