THREDDS Technical Summary

Overview

THREDDS fundamentally provides middleware services to bridge the gap between data providers and data consumers. We are also involved in developing and enhancing some of the underlying data access software tools, libraries and protocols themselves, as well as influencing how data providers and clients use them.

THREDDS is a key element in support of Unidata 2008 proposal's "Distributed, organized collections of digital material" (endeavor 5), and "Improved data access infrastructure" (endeavor 6).

Accomplishments

Dataset Inventory Catalogs are XML documents that allow a data provider to simply list available on-line datasets. The catalog creator can group datasets into a simple hierarchical classification scheme, which makes a catalog into a “logical data directory”. At a minimum, the catalog specifies the “human readable” dataset name, and how to access it.  The catalog also provides a place to add arbitrary metadata about the dataset. We are focusing on enhancing selected datasets by adding space and time bounding boxes, standard names, and data type information. Catalogs can be static XML files, or dynamically generated by Web servers to track continuously changing datasets.

Simple THREDDS Servers are data servers that have Dataset Inventory Catalogs associated with them. The primary focus of THREDDS has been developing these servers in collaboration with our data provider partners. Current servers include ones at IRI/LDEO (Columbia), SSEC (Madison), NOAA-CIRES Climate Diagnostics Center, Fleet Numerical Meteorology and Oceanography Center, and NCAR. 

The THREDDS/IDD Server makes much of the real-time data coming in on the Unidata IDD available on a THREDDS server. This includes the NCEP model data, satellite data from NOAAPORT and the Unidata/Wisconsin data streams, NEXRAD Radar, Profiler data from NOAA/FSL, as well as METAR, upper air, buoy, SAO and SHEF hydrology station data. The THREDDS/IDD Server will become part of an enhanced LDM that will be available to the Unidata community of 150 IDD users.

We have worked extensively with OpenDAP/DODS developers, and the next version of OpenDAP servers will have integrated THREDDS Catalogs. We have also developed the THREDDS OpenDAP Aggregation Server which is an OpenDAP data server that aggregates OpenDAP datasets, as well as serving netCDF datasets, and has THREDDS catalogs already integrated.  This means that the next generation of OpenDAP servers will automatically be THREDDS servers.  The Live Access Server from NOAA/PMEL is a Web server that provides access and visualization of scientific data. It is currently being modified to provide THREDDS catalogs for its data.

Another key THREDDS component for data providers is the Catalog Generator, which scans file directories and generates THREDDS catalogs automatically. This is a highly configurable tool that gives users control over the arrangement and naming of their datasets, adding metadata, extracting information from the datasets, etc. The Catalog Validator provides XML and semantic validation of Catalogs, as well as verification of the datasets themselves.

The ADDE Cataloger is a middleware service that constructs Catalogs for ADDE/Mcidas data servers. It provides “virtual dataset” services, for example, a dataset named “latest” or “last 3 hours”, along with a resolver service to translate a virtual dataset into a list of actual datasets available on the ADDE  server. This level of indirection is important for realtime and very large datasets, in order to provide users with the ability to choose datasets of the right granularity.

Dataset Query Capability XML documents are used by middleware services such as the ADDE Cataloger and the THREDDS/IDD Server to specify compactly what datasets are available from a data server. These allow data providers to specify the set of orthogonal choices (for example: station, field, time) that an end-user should make to select from a large and/or real-time collection of datasets. It allows data clients to know how to present appropriate choices to their users in a user interface, without knowing anything specific about the server.

Catalogs are read by the Dataset Searcher, which provides a programmatic interface for searching by space and time bounding boxes, standard names, data type and server type. People can also search for datasets through a web interface. This is a prototype system that will be developed further in the future. 

The THREDDS Dataset Exporter creates “resource records” appropriate to add to Digital Libraries such as DLESE, NSDL and GCMD. This prototype system uses special metadata records that are added to the datasets in a catalog, which specify the additional information needed by the DL, such as Dublin Core or DIF formats. The Dataset Exporter uses the Open Archives Initiative (OAI) protocol to send these records into the DLESE and NSDL databases.

THREDDS clients are application programs that know how to read THREDDS Catalogs and know how to read data using some or all of the THREDDS data server types, such as OpenDAP, ADDE, netCDF, etc.  The Integrated Data Viewer (IDV), also developed at Unidata, is a full featured analysis program capable of advanced 3D visualization based on the VisAD library. VGEE is an educational content development system build on top of the IDV. New Media Studios is another educational content development framework which uses Macromedia Director and IDL, and is now in the process of being made THREDDS capable. The THREDDS Data Viewer is a tool for debugging data servers and prototyping client software, using the Java client library user interface components and catalog and data access APIs.

A key to successful use of scientific datasets is providing use metadata, especially georeferencing metadata, which allows client software to manipulate and visualize datasets, and to overlay and compare data from different sources. We have helped develop and promulgate georeferencing metadata conventions for netCDF datasets, such as the CF Conventions for model data.  We have also developed extensions to the netCDF data model and implemented libraries which automatically recognize and extract georeferencing information in many of the important netCDF and OpenDAP datasets.

We have also developed extensions to the Netcdf Markup Language (NcML) that allows metadata to be added, deleted or changed in netCDF and OpenDAP datasets, as well as to subset or aggregate netCDF files. This capability has been added to the OpenDAP aggregation server, providing a powerful tool for third party metadata augmentation, which is in addition to the ability to add metadata into the Inventory Catalogs.


Status update 09/12/2005

  1. THREDDS Data Server (TDS): Most of our time has been spent getting the TDS ready for use. This includes making it secure against web attacks, remote debugging and diagnostics, viewing catalogs via a web browser, and installation and configuration documentation. The TDS has an integrated OpenDAP server for subsetted file access, an HTTP server for bulk file access, and an experimental WCS server for gridded data, along with THREDDS catalog services. The TDS is being tested on Unidata's motherlode server, as well as LEAD servers and several servers outside of Unidata.
  2. The THREDDS/IDD Data Server integrates the TDS with the LDM, providing "pull" access to the IDD data. Data directories and file names have been standardized, and compatible TDS catalogs and LDM pqact files have been created. This allows us to support a standard TDS/IDD server (and maintain changes as the IDD data streams change) for LDM users who prefer to use a standard configuration.
  3. Automatic Catalog generation and metadata extraction: Catalog version 1.0.1 integrates TDS configuration elements into THREDDS catalogs, in order to make TDS configuration as easy and as powerful as possible. We are concentrating on automatically creating dynamic catalogs for the IDD data, as well as the automatic extraction of metadata. This work builds on and will eventually supercede the THREDDS Catalog Generator.
  4. Common Data Model (CDM) Access: We added the DORADE Radar formatted files, as well as improvements to the NIDS, GRIB1 and GRIB2 readers.We are slowly adding access to all the IDD data, so that these can be served though the TDS. See the nj22 web page for current file types that can be read. We are now working on completing GINI access for the IDD satellite data.
  5. GIS/Galeon: We are using the TDS WCS server to participate in the OGC Galeon experiment, which is experimenting with using NetCDF as one of the recommended data formats for data transport within a WCS server.
  6. IDV development: We are tying nj22/CDM releases to coincide with IDV releases. IDV version 1.2 is using NetCDF library version 2.2.09. New features in the next release of the IDV will include Grid subsetting, and possibly more integration with THREDDS metadata.
  7. NetCDF Attribute Convention for Dataset Discovery is a proposed set of NetCDF attributes, to allow automatic extraction of THREDDS metadata and data discovery in discovery centers like GCMD and DLESE.
  8. Radar data formats: We are working with the radar community, including ATD, to investigate and propose a new radar file format, probably using NetCDF-4 files. This will tie into our CDM work on Radial data types.
  9. NetCDF Tools UI is for debugging nj22/CDM file reading and THREDDS data servers. It can be downloaded via webstart.