(Note that this is a work in progress, but it is being made available in its
current form with the idea that some readers will find parts of it useful)
An excellent one-page factsheet describing THREDDS is available at:
http://www.unidata.ucar.edu/publications/factsheets/2007AMSsheets/threddsFactSheet-1.doc
Brief History: Relationship to NSDL and DODS/OPenDAP
THREDDS was initially funded as a
National
Science Digital Library (NSDL) Collections project. The idea was
to develop a technology that would complement the client/server approach to
data access that had been developed by
Distributed
Oceanographic Data System (DODS at that time, now
OPeNDAP). OPeNDAP is
an internet client/server protocol that allows client application programs
(like IDL, Matlab and the IDV) to access datasets on remote servers as if
the datasets were on the local disk of the workstation. Where a client
application program would normally take the name of a local file, the user
simply has to supply a URL pointing to a dataset on a remote OPeNDAP
server. The client program then operates on the remote dataset as if
it were a local file.
The initial role of THREDDS was to supply tools that would create catalogs
of, and provide client access to, the collections of data on remote
servers. These catalogs are machine readable lists of datasets
available on OPeNDAP servers with enough user-readable metadata to allow
users of THREDDS-enabled clients to browse catalogs of remote datasets just
as they browse the file system on their own workstations. The
inventory level catalogs also supplied the use metadata required to enable
client software to do reasonable things with the data once it was
accessed. Early on in the project, it was recognized that the
the simple inventory list catalogs were not sufficient. In fact, a
hierarchy of catalogs is needed so that groups of inventories could be
catalogued at a higher level. For example, all the inventory catalogs
for output from NCEP forecast models can be grouped into an NCEP model
catalog. LEAD model output can be grouped into a LEAD model output
catalog, etc. These catalogs of catalogs can be grouped at higher
levels. To get a sense of how this works on practice, you can browse
the
THREDDS
Top Level Catalog on Motherlode:
http://motherlode.ucar.edu:8080/thredds/catalog.html
Below is a screenshot of the top level THREDDS catalog on the motherlode
server.
Another way to look at these catalogs is that they are textual documents
that have special pointers to binary datasets that can be accessed via
client application software using special protocols like OPeNDAP. Two
important THREDDS capabilities relate to this characteristic.
-
First, as textual documents, the catalogs can be harvested and indexed
just as other documents are indexed by Google and other search
engines. Thus THREDDS catalogs can be included into the NSDL,
DLESE, and other discovery centers such as the NASA Global Change Master
Directory (GCMD). An interesting exercise is to visit the
GCMD and search for
"THREDDS".
-
Second, THREDDS catalogs are such that they can be created by "third
parties" -- groups other than the data providers who serve the
data. The technology is such that one can create a THREDDS catalog
that points to datasets on several different remote servers.
This use of the THREDDS technology can take the form of a web publication
describing a scientific phenomenon with embedded pointers that initiate
THREDDS-enabled client software such as the IDV and have it bring in data
from remote THREDDS-enabled servers. At present, one needs a properly
configured workstation to take advantage of these "data interactive" or
"compound" publications, but it can be done at least for java-based THREDDS
clients. Some examples of such
data
interactive documents are listed at
http://www.unidata.ucar.edu/projects/THREDDS/DataPublications/
From early on, OPeNDAP and THREDDS were closely tied to the netCDF which is
an interface for array-oriented data access and a library that provides an
implementation of the interface. The netCDF library also defines a
machine-independent format for representing scientific data. Together, the
interface, library, and format support the creation, access, and sharing of
scientific data. It is by far the most widely used Unidata technology.
In its original implementation, OPeNDAP provided a special version of the
netCDF library interface that for applications like IDL and Matlab to link
to. Because those desktop applications were already set up for access
to access local netCDF files, the special OPeNDAP-enabled libraries allowed
them to access remote files on OPeNDAP servers without any changes to the
IDL or Matlab code. From the beginning, OPeNDAP leveraged the netCDF
interface.
One thing to be aware of is that the Java implementation of the netCDF
interface is a separate implementation which is used for experimenting with
new features and facilities. So some capabilities are available in
Java netCDF that are not yet incorporated into the others.
NetCDF-Java 2.2 is a 100% Java library which includes a prototype
implementation of the
Common
Data Model (CDM ). This netCDF API supports access several file
formats:
-
General: NetCDF, HDF5, OPeNDAP
-
Grids: GRIB1, GRIB2
-
Radar: NEXRAD, NIDS, DORADE
-
Satellite: DMSP, GINI
and provides access to THREDDS catalogs.
The
Hierarchical
Data Format (HDF) came into being shortly after the netCDF.
Curiously, part of the reason for developing the HDF was the mistaken
impression that netCDF was going to be marketed as a commercial product to
recover development costs -- as was the case with many software packages
developed in the 80s. According to the web site:
The HDF software includes I/O libraries and
tools for analyzing, visualizing, and converting scientific data. There are
two HDF formats, HDF (4.x, generally known as HDF4, and previous releases)
and HDF5. These formats are completely different and NOT compatible. (NOTE:
There are no plans to drop support for HDF 4.x.)
HDF provides many functions similar to netCDF, but in broad-brush
terms, it has many more features than netCDF and, as a consequence, it is a
more complicated interface. For many years, there were many pleas from
users (but no funding) to bring the two technologies together. In the
end, NASA did fund a joint project between Unidata and the HDF Group to
develop a
netCDF4
that would enable access to data stored in HDF5 files. The netCDF4
components are complete but await the HD5 read/write components.
Experiences with netCDF development and support within Unidata, OPeNDAP as
the support center and in conjunction with THREDDS, as well as HDF as part
of the netCDF4 project led Unidata to consider the advantages of various
characteristcs of the data models associated with each technology.
According to John Caron, the primary CDM architect, at the data access
level, the CDM maintains as much as possible of the elegance of the netCDF-3
inteface, but add important features from OPeNDAP and HDF, most notably:
-
more low level data types -- including "string"
-
structures
-
groups
The CDM is implemented in Java netCDF. For those aquainted with UML
diagram representations of data models:
Common Data Model (data access layer) UML Diagram
Standards-based Interfaces
Interoperability with the GIS (Geographic Information Systems) community has
been a primary focus of the second generation THREDDS. The avenue
THREDDS has taken is that of open standards web services protocols -- namely
those developed by the
Open
Geospatial Constortium (OGC). Because of the need to make image
and gridded forecast datasets available, the main thrust of the initial
effort was on the
Web
Coverage Service (WCS) specification. Unidata spearheaded a
specific OGC Interoperability Experiment, called
GALEON
(Geo-interface for Air, Land, Environmental, Oceans NetCDF). Most
of the activity of the first phase of GALEON has focused on testing the
interactions between WCS clients and servers for netCDF datasets, modifying
those client and server implementations based on the testing, and
recommending modifications and augmentations of the relevant OGC interfaces
where appropriate. The status of these implementation is described on the
GALEON wiki
Implemenation
and Progress
Page:
http://galeon-wcs.jot.com/WikiHome/Implementation%20Progress%20Page.
The overall GALEON target is this general goal of interoperability via
standards-based web services interfaces. But there is one rather more
specific objective that involves using these interfaces as the basis for a
gateway between traditional GIS applications and datasets available in
existing servers in the FES community. These servers which number in the
hundreds are based on a set of client-server protocols that have evolved
in the FES community over the last decade. The basic building blocks are
NetCDF, OPeNDAP, ADDE, and THREDDS technologies. But there are other
services built on these, for example LAS, GDS, and INGRID. There are
already several hundred of these servers making a wide-variety and large
volume of data available to existing client applications. So a key aim of
GALEON is to expand the usefulness of these servers by adding a
standards-based interface to provide a gateway so that WCS clients can
access the datasets.
The diagram below is a schematic of this gateway implementation as it was
envisioned at the time GALEON was initiated. Since that time, development
work has integrated the underlying THREDDS/OPenDAP services into a package
called the THREDDS Data Server (TDS) which has a rudimentary WCS interface
built in.

Initial Concept of WCS-interface as a Gateway to Existing FES Services
The GALEON experiments have resulted in many recommended changes to OGC
WCS specification. Among the most important to the users of
atmospheric and oceanographic netCDF datasets are:
-
WCS encoding “profiles” instead of fixed list of encoding formats
-
Multiple “variables” or “parameters” in a coverage (e.g., pressure,
temperature, etc.)
-
Coverages with 3 spatial dimensions
-
Coverages with multiple time dimensions (e.g. forecast time in model
output)
-
Non-spatial “height” dimension, (e.g., atmospheric pressure, ocean
density)
-
Irregularly-spaced grids
The THREDDS Data Server (TDS) integrates many of the technologies described
in the above sections into a distributable, supported software package. As
the web page indicates, TDS is a web server that provides metadata and data
access for scientific datasets, building on and extending a number of
existing technologies:
-
THREDDS Dataset Inventory Catalogs are used to provide virtual
directories of available data and their associated metadata. These
catalogs can be generated dynamically or statically.
-
The
Netcdf-Java
library reads NetCDF, OpenDAP, and HDF5 datasets, as well as other
binary formats such as GRIB and NEXRAD into a "Common Data Model" (CDM).
This is an abstract data model that the netCDF (Unidata), HDF5 (NCSA)
and OPeNDAP (University of Rhode Island) developers are using to
converge their respective data models. The CDM also adds "Georeferencing
Coordinate Systems" and specialized "Scientific Data Type" layers, which
provides the semantics needed to convert datasets to other protocols and
formats such as those required by GIS systems. The library adds this
information by parsing well known "attribute conventions", and by using
THREDDS metadata to add missing coordinate system information and other
metadata.
-
An integrated server provides
OpenDAP access to any datasets that
can be read through the Netcdf-Java library. OpenDAP is a widely used,
subsetting data access method built on the HTTP (web) protocol.
-
An integrated server provides bulk file access through the HTTP
protocol.
-
An integrated server provides data access through the
OpenGIS Consortium
(OGC) Web Coverage Service (WCS) protocol for any "gridded" dataset
whose coordinate system information is complete. Users can add missing
information to a dataset where needed, in order to make this work.
The THREDDS Data Server is implemented in 100% Java, and is contained in a
single war file, which allows very easy installation into the open-source
Tomcat web server. This means
that users can implement the entire package on nearly any computing system.
THREDDS Data Server Schematic
Because the TDS enables data access via most formal and de facto standard
interfaces, it allows users to access a wide variety of data using the tools
with which they are familiar. These range from browser based tools
such as the Live Access Server (LAS) to powerful desktop applications such
as the Unidata IDV described in a subsequent section. One set of
client applications that are of particular interest in the interoperability
context are the ESRI arcGIS products. In release 9.2 of arcGIS, the
tools have the ability to read and write netCDF files that conform to the CF
(Climate and Forecast) conventions. Slated for the 9.3 release is
remote data access via the WCS protocol. That means that traditional
GIS users can access netCDF weather and oceanographic datasets right now if
the data are local and they will be able to access them remotely via WCS in
the next release.
Integrated Data Viewer (IDV)
Unidata's Integrated Data Viewer (IDV) is a Java(TM)-based software
framework for analyzing and visualizing geoscience data. The IDV brings
together the ability to display and work with satellite imagery, gridded
data, surface observations, balloon soundings, NWS WSR-88D Level II and
Level III radar data, and NOAA National Profiler Network data, all within a
unified interface. For the Unidata community (and many others as it
turns out), it is provides a powerful desktop analysis and display
application that can access datasets that reside on remote THREDDS Data
Servers.
Another article lists a number of other projects related in some way to
THREDDS.
http://www.unidata.ucar.edu/projects/THREDDS/GALEON/Reports/RelatedTechnologies.html