On the Unidata THREDDS Initiative
and
Its Relationship to Other Unidata Efforts


Draft by Ben Domenico
Last Modified: March 14, 2007

(Note that this is a work in progress, but it is being made available in its current form with the idea that some readers will find parts of it useful)

THREDDS (THematic Real-time Environmental Distributed Data Services)

An excellent one-page factsheet describing THREDDS is available at:

http://www.unidata.ucar.edu/publications/factsheets/2007AMSsheets/threddsFactSheet-1.doc

Brief History: Relationship to NSDL and DODS/OPenDAP

THREDDS was initially funded as a National Science Digital Library (NSDL) Collections project.  The idea was to develop a technology that would complement the client/server approach to data access that had been developed by Distributed Oceanographic Data System (DODS at that time, now OPeNDAP).  OPeNDAP is an internet client/server protocol that allows client application programs (like IDL, Matlab and the IDV) to access datasets on remote servers as if the datasets were on the local disk of the workstation.  Where a client application program would normally take the name of a local file, the user simply has to supply a URL pointing to a dataset on a remote OPeNDAP server.  The client program then operates on the remote dataset as if it were a local file. 

The initial role of THREDDS was to supply tools that would create catalogs of, and provide client access to, the collections of data on remote servers.  These catalogs are machine readable lists of datasets available on OPeNDAP servers with enough user-readable metadata to allow users of THREDDS-enabled clients to browse catalogs of remote datasets just as they browse the file system on their own workstations.  The inventory level catalogs also supplied the use metadata required to enable client software to do reasonable things with the data once it was accessed.   Early on in the project, it was recognized that the the simple inventory list catalogs were not sufficient.  In fact, a hierarchy of catalogs is needed so that groups of inventories could be catalogued at a higher level.  For example, all the inventory catalogs for output from NCEP forecast models can be grouped into an NCEP model catalog.  LEAD model output can be grouped into a LEAD model output catalog, etc.  These catalogs of catalogs can be grouped at higher levels.  To get a sense of how this works on practice, you can browse the THREDDS Top Level Catalog on Motherlode: http://motherlode.ucar.edu:8080/thredds/catalog.html

Below is a screenshot of the top level THREDDS catalog on the motherlode server.



Another way to look at these catalogs is that they are textual documents that have special pointers to binary datasets that can be accessed via client application software using special protocols like OPeNDAP.  Two important THREDDS capabilities relate to this characteristic.
 

This use of the THREDDS technology can take the form of a web publication describing a scientific phenomenon with embedded pointers that initiate THREDDS-enabled client software such as the IDV and have it bring in data from remote THREDDS-enabled servers.  At present, one needs a properly configured workstation to take advantage of these "data interactive" or "compound" publications, but it can be done at least for java-based THREDDS clients.  Some examples of such data interactive documents are listed at http://www.unidata.ucar.edu/projects/THREDDS/DataPublications/

netCDF (Network Common Data Form)

From early on, OPeNDAP and THREDDS were closely tied to the netCDF which is an interface for array-oriented data access and a library that provides an implementation of the interface. The netCDF library also defines a machine-independent format for representing scientific data. Together, the interface, library, and format support the creation, access, and sharing of scientific data.  It is by far the most widely used Unidata technology.

In its original implementation, OPeNDAP provided a special version of the netCDF library interface that for applications like IDL and Matlab to link to.  Because those desktop applications were already set up for access to access local netCDF files, the special OPeNDAP-enabled libraries allowed them to access remote files on OPeNDAP servers without any changes to the IDL or Matlab code.  From the beginning, OPeNDAP leveraged the netCDF interface.

One thing to be aware of  is that the Java implementation of the netCDF interface is a separate implementation which is used for experimenting with new features and facilities.  So some capabilities are available in Java netCDF that are not yet incorporated into the others.

NetCDF-Java 2.2 is a 100% Java library which includes a prototype implementation of the Common Data Model (CDM ). This netCDF API supports access several file formats:

and provides access to THREDDS catalogs.

HDF (Hierarachical Data Format)

The Hierarchical Data Format (HDF) came into being shortly after the netCDF.  Curiously, part of the reason for developing the HDF was the mistaken impression that netCDF was going to be marketed as a commercial product to recover development costs -- as was the case with many software packages developed in the 80s.  According to the web site: The HDF software includes I/O libraries and tools for analyzing, visualizing, and converting scientific data. There are two HDF formats, HDF (4.x, generally known as HDF4, and previous releases) and HDF5. These formats are completely different and NOT compatible. (NOTE: There are no plans to drop support for HDF 4.x.)

HDF provides many functions similar to netCDF, but in broad-brush terms, it has many more features than netCDF and, as a consequence, it is a more complicated interface.  For many years, there were many pleas from users (but no funding) to bring the two technologies together.  In the end, NASA did fund a joint project between Unidata and the HDF Group to develop a netCDF4 that would enable access to data stored in HDF5 files.  The netCDF4 components are complete but await the HD5 read/write components.

CDM (Common Data Model)

Experiences with netCDF development and support within Unidata, OPeNDAP as the support center and in conjunction with THREDDS, as well as HDF as part of the netCDF4 project led Unidata to consider the advantages of various characteristcs of the data models associated with each technology.  According to John Caron, the primary CDM architect, at the data access level, the CDM maintains as much as possible of the elegance of the netCDF-3 inteface, but add important features from OPeNDAP and HDF, most notably:


The CDM is implemented in Java netCDF.  For those aquainted with UML diagram representations of data models:

Common Data Model (data access layer) UML Diagram




Standards-based Interfaces

Interoperability with the GIS (Geographic Information Systems) community has been a primary focus of the second generation THREDDS.  The avenue THREDDS has taken is that of open standards web services protocols -- namely those developed by the Open Geospatial Constortium (OGC).  Because of the need to make image and gridded forecast datasets available, the main thrust of the initial effort was on the Web Coverage Service (WCS) specification.  Unidata spearheaded a specific OGC Interoperability Experiment, called GALEON (Geo-interface for Air, Land, Environmental, Oceans NetCDF).  Most of the activity of the first phase of GALEON has focused on testing the interactions between WCS clients and servers for netCDF datasets, modifying those client and server implementations based on the testing, and recommending modifications and augmentations of the relevant OGC interfaces where appropriate. The status of these implementation is described on the GALEON wiki Implemenation and Progress Page: http://galeon-wcs.jot.com/WikiHome/Implementation%20Progress%20Page.

The overall GALEON target is this general goal of interoperability via standards-based web services interfaces. But there is one rather more specific objective that involves using these interfaces as the basis for a gateway between traditional GIS applications and datasets available in existing servers in the FES community. These servers which number in the hundreds are based on a set of client-server protocols that have evolved in the FES community over the last decade. The basic building blocks are NetCDF, OPeNDAP, ADDE, and THREDDS technologies. But there are other services built on these, for example LAS, GDS, and INGRID. There are already several hundred of these servers making a wide-variety and large volume of data available to existing client applications. So a key aim of GALEON is to expand the usefulness of these servers by adding a standards-based interface to provide a gateway so that WCS clients can access the datasets.


The diagram below is a schematic of this gateway implementation as it was envisioned at the time GALEON was initiated. Since that time, development work has integrated the underlying THREDDS/OPenDAP services into a package called the THREDDS Data Server (TDS) which has a rudimentary WCS interface built in.


Initial Concept of WCS-interface as a Gateway to Existing FES Services


The GALEON experiments have resulted in many recommended changes to OGC WCS specification.  Among the most important to the users of atmospheric and oceanographic netCDF datasets are:



TDS (THREDDS Data Server)

The THREDDS Data Server (TDS) integrates many of the technologies described in the above sections into a distributable, supported software package. As the web page indicates, TDS is a web server that provides metadata and data access for scientific datasets, building on and extending a number of existing technologies:
  1. THREDDS Dataset Inventory Catalogs are used to provide virtual directories of available data and their associated metadata. These catalogs can be generated dynamically or statically.
  2. The Netcdf-Java library reads NetCDF, OpenDAP, and HDF5 datasets, as well as other binary formats such as GRIB and NEXRAD into a "Common Data Model" (CDM). This is an abstract data model that the netCDF (Unidata), HDF5 (NCSA) and OPeNDAP (University of Rhode Island) developers are using to converge their respective data models. The CDM also adds "Georeferencing Coordinate Systems" and specialized "Scientific Data Type" layers, which provides the semantics needed to convert datasets to other protocols and formats such as those required by GIS systems. The library adds this information by parsing well known "attribute conventions", and by using THREDDS metadata to add missing coordinate system information and other metadata.
  3. An integrated server provides OpenDAP access to any datasets that can be read through the Netcdf-Java library. OpenDAP is a widely used, subsetting data access method built on the HTTP (web) protocol.
  4. An integrated server provides bulk file access through the HTTP protocol.
  5. An integrated server provides data access through the OpenGIS Consortium (OGC) Web Coverage Service (WCS) protocol for any "gridded" dataset whose coordinate system information is complete. Users can add missing information to a dataset where needed, in order to make this work.

The THREDDS Data Server is implemented in 100% Java, and is contained in a single war file, which allows very easy installation into the open-source Tomcat web server. This means that users can implement the entire package on nearly any computing system.


THREDDS Data Server Schematic

Because the TDS enables data access via most formal and de facto standard interfaces, it allows users to access a wide variety of data using the tools with which they are familiar.  These range from browser based tools such as the Live Access Server (LAS) to powerful desktop applications such as the Unidata IDV described in a subsequent section.  One set of client applications that are of particular interest in the interoperability context are the ESRI arcGIS products.  In release 9.2 of arcGIS, the tools have the ability to read and write netCDF files that conform to the CF (Climate and Forecast) conventions.  Slated for the 9.3 release is remote data access via the WCS protocol.  That means that traditional GIS users can access netCDF weather and oceanographic datasets right now if the data are local and they will be able to access them remotely via WCS in the next release.

Integrated Data Viewer (IDV)

Unidata's Integrated Data Viewer (IDV) is a Java(TM)-based software framework for analyzing and visualizing geoscience data. The IDV brings together the ability to display and work with satellite imagery, gridded data, surface observations, balloon soundings, NWS WSR-88D Level II and Level III radar data, and NOAA National Profiler Network data, all within a unified interface.  For the Unidata community (and many others as it turns out), it is provides a powerful desktop analysis and display application that can access datasets that reside on remote THREDDS Data Servers.

THREDDS-related Projects

Another article lists a number of other projects related in some way to THREDDS.

 http://www.unidata.ucar.edu/projects/THREDDS/GALEON/Reports/RelatedTechnologies.html