Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

THREDDS Technical Summary

Overview

THREDDS fundamentally provides middleware services to bridge the gap between data providers and data consumers. We are also involved in developing and enhancing some of the underlying data access software tools, libraries and protocols themselves, as well as influencing how data providers and clients use them.

THREDDS is a key element in support of Unidata 2008 proposal's "Distributed, organized collections of digital material" (endeavor 5), and "Improved data access infrastructure" (endeavor 6).

Accomplishments

Dataset Inventory Catalogs are XML documents that allow a data provider to simply list available on-line datasets. The catalog creator can group datasets into a simple hierarchical classification scheme, which makes a catalog into a “logical data directory”. At a minimum, the catalog specifies the “human readable” dataset name, and how to access it.  The catalog also provides a place to add arbitrary metadata about the dataset. We are focusing on enhancing selected datasets by adding space and time bounding boxes, standard names, and data type information. Catalogs can be static XML files, or dynamically generated by Web servers to track continuously changing datasets.

Simple THREDDS Servers are data servers that have Dataset Inventory Catalogs associated with them. The primary focus of THREDDS has been developing these servers in collaboration with our data provider partners. Current servers include ones at IRI/LDEO (Columbia), SSEC (Madison), NOAA-CIRES Climate Diagnostics Center, Fleet Numerical Meteorology and Oceanography Center, and NCAR. 

The THREDDS/IDD Server makes much of the real-time data coming in on the Unidata IDD available on a THREDDS server. This includes the NCEP model data, satellite data from NOAAPORT and the Unidata/Wisconsin data streams, NEXRAD Radar, Profiler data from NOAA/FSL, as well as METAR, upper air, buoy, SAO and SHEF hydrology station data. The THREDDS/IDD Server will become part of an enhanced LDM that will be available to the Unidata community of 150 IDD users.

We have worked extensively with OpenDAP/DODS developers, and the next version of OpenDAP servers will have integrated THREDDS Catalogs. We have also developed the THREDDS OpenDAP Aggregation Server which is an OpenDAP data server that aggregates OpenDAP datasets, as well as serving netCDF datasets, and has THREDDS catalogs already integrated.  This means that the next generation of OpenDAP servers will automatically be THREDDS servers.  The Live Access Server from NOAA/PMEL is a Web server that provides access and visualization of scientific data. It is currently being modified to provide THREDDS catalogs for its data.

Another key THREDDS component for data providers is the Catalog Generator, which scans file directories and generates THREDDS catalogs automatically. This is a highly configurable tool that gives users control over the arrangement and naming of their datasets, adding metadata, extracting information from the datasets, etc. The Catalog Validator provides XML and semantic validation of Catalogs, as well as verification of the datasets themselves.

The ADDE Cataloger is a middleware service that constructs Catalogs for ADDE/Mcidas data servers. It provides “virtual dataset” services, for example, a dataset named “latest” or “last 3 hours”, along with a resolver service to translate a virtual dataset into a list of actual datasets available on the ADDE  server. This level of indirection is important for realtime and very large datasets, in order to provide users with the ability to choose datasets of the right granularity.

Dataset Query Capability XML documents are used by middleware services such as the ADDE Cataloger and the THREDDS/IDD Server to specify compactly what datasets are available from a data server. These allow data providers to specify the set of orthogonal choices (for example: station, field, time) that an end-user should make to select from a large and/or real-time collection of datasets. It allows data clients to know how to present appropriate choices to their users in a user interface, without knowing anything specific about the server.

Catalogs are read by the Dataset Searcher, which provides a programmatic interface for searching by space and time bounding boxes, standard names, data type and server type. People can also search for datasets through a web interface. This is a prototype system that will be developed further in the future. 

The THREDDS Dataset Exporter creates “resource records” appropriate to add to Digital Libraries such as DLESE, NSDL and GCMD. This prototype system uses special metadata records that are added to the datasets in a catalog, which specify the additional information needed by the DL, such as Dublin Core or DIF formats. The Dataset Exporter uses the Open Archives Initiative (OAI) protocol to send these records into the DLESE and NSDL databases.

THREDDS clients are application programs that know how to read THREDDS Catalogs and know how to read data using some or all of the THREDDS data server types, such as OpenDAP, ADDE, netCDF, etc.  The Integrated Data Viewer (IDV), also developed at Unidata, is a full featured analysis program capable of advanced 3D visualization based on the VisAD library. VGEE is an educational content development system build on top of the IDV. New Media Studios is another educational content development framework which uses Macromedia Director and IDL, and is now in the process of being made THREDDS capable. The THREDDS Data Viewer is a tool for debugging data servers and prototyping client software, using the Java client library user interface components and catalog and data access APIs.

A key to successful use of scientific datasets is providing use metadata, especially georeferencing metadata, which allows client software to manipulate and visualize datasets, and to overlay and compare data from different sources. We have helped develop and promulgate georeferencing metadata conventions for netCDF datasets, such as the CF Conventions for model data.  We have also developed extensions to the netCDF data model and implemented libraries which automatically recognize and extract georeferencing information in many of the important netCDF and OpenDAP datasets.

We have also developed extensions to the Netcdf Markup Language (NcML) that allows metadata to be added, deleted or changed in netCDF and OpenDAP datasets, as well as to subset or aggregate netCDF files. This capability has been added to the OpenDAP aggregation server, providing a powerful tool for third party metadata augmentation, which is in addition to the ability to add metadata into the Inventory Catalogs.


Status update 04/26/2007

  1. THREDDS Data Server (TDS): We are about to release 3.16.
    1. New authorization/authentication capabilities allow servers to be configured with restricted dataset access.
    2. Forecast Model Run Collections : Automatically create 1D subsets of 2D time datasets. (Example)
    3. Runtime configuration completed, allows users to configure, load classes at runtime.
    4. Dataset Source plugin: allows users to create custom and virtual datasets.
    5. Refactor of dataset filter interface (CrawlableDatasetFilter) simplifies new implementations
    6. First draft of TDS Tutorial for 2-day workshop.
    7. OPeNDAP server: direct implementation of ascii ascii response (not recursive call)

  2. Common Data Model: We have released 2.2.20, last stable release using Java 1.4.
    1. Start joint development with opendap.org on OPeNDAP Java library
      • Add all necessary capabilities for HTTP authentication
      • Change OPeNDAP default to accept compressed responses
    2. Radial Datatypes now can be visualizeed in the IDV
    3. Starting to handle McIDAS AREA files : McIdasAreaProjections, prototype IOSP.
    4. NcML
      • Specify IOSP class, parameter, buffer_size.
      • New aggregation types: ForecastModelRunCollection, ForecastModelRunSingleCollection
    5. Runtime configuration completed, allows users to configure, load classes at runtime.
    6. GRIB files
      • can now handle thin grids
      • vertical coordinates with multiple values, considered as bounds.
    7. Allow bzip2 compression.
    8. First draft of Tutorial for 2-day workshop.

  3. International Standards:

Status update 08/15/2006

  1. THREDDS Data Server (TDS): We are at stable release 3.12.
    1. NetcdfServer: subset NCEP GRIB models and return NetCDF/CF files. (Example) : This is being tested/used by CUAHSI and others.
    2. A major rewrite (CrawlableDataset) allows parts of the TDS code to be used in the new OPeNDAP Server 4 to generate THREDDS catalogs as an integral part of OPeNDAP servers.
    3. OAI harvesting added, both DIF and ADN records improved. Motherlode records exported to GCMD, DLESE.
    4. Improvements on InvDatasetScan:
      1. Generates last modified dates in catalogs which allows HTML view to display the date.
      2. "Latest" dataset now determined by file name and last modified time to give incoming files time to finish arriving.
    5. Continue to test and improve TDS/NcML Aggregation along with partners such as Pacific Fisheries Environmental Laboratory (Roy Mendelson)
    6. WCS Server was successfully used in the GALEON experiment for "WCS gateways to netCDF datasets".
    7. HTML catalog view now correctly sets the Last Modified field in dataset listings.
  2. Common Data Model: We are at stable release 2.2.16
    1. BUFR files are being decoded into the CDM, ongoing work to improve and add tables.
    2. New Radar Datatype interface and implementations for DORADE, NEXRAD 2 and NEXRAD 3 integrated into IDV release 2.0.
    3. Users can now plug in their own coordinate transforms (CoordTransBuilder).
  3. Ongoing work:
    1. New kind of NcML Aggregation: Forecast Model Run Collection, for gridded data. GeoGrid will be extended to handle 2 time dimensions, and possibly also an ensemble dimension.
    2. GRIB files:
      1. handle ensemble and GRIB2 error variables
      2. Standardize coordinates across runs.
    3. Working to standardize "Dapper Conventions" for OPeNDAP sequences and nested Structures.
    4. The Unidata/LEAD project is integrating the TDR (THREDDS Data Repository) with the TDS.

Status update 04/01/2006

  1. THREDDS Data Server (TDS): We are at stable release 3.8. Major new features since last report:
    1. TDS now has the functionality of the old OPeNDAP Aggregation Server (AS), which is now officially retired. The aggregation is specified using NcML so that aggregation can also be done on local files and for non-NetCDF files.
    2. Common Data Model Coordinate System Validation: allows users to check online if their datasets can be correctly read in the IDV.
    3. Forecast Model Run Collection Inventory (Example) : we are now tracking detailed inventory from the IDD NCEP model grids. This allows users to know exactly grids are in each model run, and allows us to monitor IDD problems.
    4. Improvements and bug fixes to the TDS/OPeNDAP server, including using session cookies for increased reliability and performance.
  2. Common Data Model:
    1. Can now handle OPeNDAP sequences and nested Structures.
    2. Many bug fixes and incremental improvements.
  3. Ongoing work:
    1. BUFR files are being decoded into the CDM, in anticipation of IDD datastreams moving to BUFR.
    2. New Radar Datatype interface and implementations for DORADE, NEXRAD 2 and 3; working to integrate these into IDV.
    3. Working closely with OPeNDAP to integrate THREDDS catalogs into their next generation of servers.
    4. The Unidata/LEAD project is integrating the TDR (THREDDS Data Repository) with the TDS.

Status update 10/18/2005

  1. THREDDS Data Server (TDS): Release 3.2 was completed and is available for anyone to use. Even number releases (3.2) are stable, odd numbers (3.3) are the development version.
  2. THREDDS IDD Server: Much of the IDD datastream data on motherlode is now made available through the TDS, release 3.3.02. Special pqact and corresponding catalogs are now standardized, so others can install a "standard" THREDDS IDD server.

Status update 09/12/2005

  1. THREDDS Data Server (TDS): Most of our time has been spent getting the TDS ready for use. This includes making it secure against web attacks, remote debugging and diagnostics, viewing catalogs via a web browser, and installation and configuration documentation. The TDS has an integrated OpenDAP server for subsetted file access, an HTTP server for bulk file access, and an experimental WCS server for gridded data, along with THREDDS catalog services. The TDS is being tested on Unidata's motherlode server, as well as LEAD servers and several servers outside of Unidata.
  2. The THREDDS/IDD Data Server integrates the TDS with the LDM, providing "pull" access to the IDD data. Data directories and file names have been standardized, and compatible TDS catalogs and LDM pqact files have been created. This allows us to support a standard TDS/IDD server (and maintain changes as the IDD data streams change) for LDM users who prefer to use a standard configuration.
  3. Automatic Catalog generation and metadata extraction: Catalog version 1.0.1 integrates TDS configuration elements into THREDDS catalogs, in order to make TDS configuration as easy and as powerful as possible. We are concentrating on automatically creating dynamic catalogs for the IDD data, as well as the automatic extraction of metadata. This work builds on and will eventually supercede the THREDDS Catalog Generator.
  4. Common Data Model (CDM) Access: We added the DORADE Radar formatted files, as well as improvements to the NIDS, GRIB1 and GRIB2 readers.We are slowly adding access to all the IDD data, so that these can be served though the TDS. See the nj22 web page for current file types that can be read. We are now working on completing GINI access for the IDD satellite data.
  5. GIS/Galeon: We are using the TDS WCS server to participate in the OGC Galeon experiment, which is experimenting with using NetCDF as one of the recommended data formats for data transport within a WCS server.
  6. IDV development: We are tying nj22/CDM releases to coincide with IDV releases. IDV version 1.2 is using NetCDF library version 2.2.09. New features in the next release of the IDV will include Grid subsetting, and possibly more integration with THREDDS metadata.
  7. NetCDF Attribute Convention for Dataset Discovery is a proposed set of NetCDF attributes, to allow automatic extraction of THREDDS metadata and data discovery in discovery centers like GCMD and DLESE.
  8. Radar data formats: We are working with the radar community, including ATD, to investigate and propose a new radar file format, probably using NetCDF-4 files. This will tie into our CDM work on Radial data types.
  9. NetCDF Tools UI is for debugging nj22/CDM file reading and THREDDS data servers. It can be downloaded via webstart.

Status update 03/30/2005

  1. THREDDS Data Server: This merges the THREDDS Catalog Server with the data access capabilities of the NetCDF Java 2.2 library. The goal is to create a 100% Java servlet-based data server. Initially the data serving will be through an integrated OpenDAP server, and an experimental WCS server. We are on the verge of releasing an alpha version of this. It will initially serve out the IDD Data needed in the LEAD project. We will extend it to serve out all of the IDD Data stream.
  2. Catalog XML 1.1 is currently being developed to integrate directory scanning into the Inventory Dataset Catalog specification. This will use the catalog generation library. The goal is to make data server configuration as easy and powerful as possible.
  3. OPeNDAP technology: We are providing our catalog-generation and configuration code to the next version of OPeNDAP library. We are continuing to work closely with James Gallagher et al to influence the development of the OPeNDAP specification.
  4. GIS: We have an experimental WCS server integrated into the THREDDS Data Server. We are working closely with Martin Daly at CADCORP and Frank Warmerdam, the developer of the GDAL library to ensure interoperability.
  5. Common Data Model Data Access: Common Data Model (CDM) is an abstract data model that the NetCDF (Unidata), HDF5 (NCSA) and OpenDAP (University of Rhode Island) developers are working to converge their respective data models towards. This is also intended to be the NetCDF-4 data model. The design of this is mostly stable, and the APIs and implementations in the NetCDF-4 and Java-NetCDF libraries are close to stable.
  6. CDM Coordinate Systems and Scientific Data Types: These are higher layer abstractions of the Common Data Model, which provides the semantics needed to convert datasets to other protocols and formats such as those required by GIS systems. The Coordinate System abstract model is complete, and extensive work is ongoing to identify and map existing datasets and Conventions. A NetCDF _Coordinate attribute conventions encoding has been proposed and needs review. The Scientific Data Types layer is still undergoing extensive revisions, and specialized email groups such as for Radar datasets are being formed to provide input.
  7. NetCDF Java 2.2 (nj22) library: reads NetCDF, OpenDAP, and HDF5 datasets into the CDM. It provides an "I/O Service Provider" framework for reading other binary formats as if these were NetCDF files. So far we can read GRIB-1, GRIB-2, NEXRAD level 2, NEXRAD level 3 (NIDS), GINI, and DMSP satellite files. Currently working on DORADE Radar formatted files. The library also provides a framework for parsing well known "attribute conventions" to identify coordinate systems and scientific data types. Most of the work to date has been for gridded data, including ADAS, AWIPS, CF-1, COARDS, CSM, NUWG, and WRF. Current work is focusing on Radar, especially formats used by ATD, as well as station data, including AWIPS and MADIS. The nj22 library is integrated with the THREDDS library to allow metadata in THREDDS catalogs to add missing coordinate system information and other metadata. The library is currently in alpha testing, and should go into beta before summer.
  8. NetCDF Markup Language (NcML), an XML format for netCDF files. New work on NcML has focused on making it a powerful language for augmenting and modifying CDM datasets. The NcML Core and Dataset schemas have been merged into a common"NcML-2.2" schema. NcML Coordinate Systems may be deprecated in favor of using the _Coordinate attribute conventions. NcML can also now be used to define and generate new NetCDF-3 files, like ncgen. This work is still in alpha-testing.
  9. NetCDF Tools UI is a continuation of the THREDDS Data Viewer client for debugging NetCDF file reading and THREDDS data servers and testing client-side libraries. It can be downloaded via webstart.
  10. IDV development: IDV version 1.2 is using NetCDF library version 2.2, and so can take advantage of our framework for identifying and augmenting coordinate systems, as well as reading other file formats such as GRIB. We have worked hard to make sure that the NetCDF library development follows the needs and priorities of the IDV development, and they will continue to be a main driver for our scientific datatype design.

Status update 10/12/2004

 

Common Data Model

Work is ongoing to develop a "common data model" that can be used to read THREDDS datasets and automatically extract metadata. This is being prototyped in the NetCDF-Java 2.2 library, and will eventually be adapted into the VisAD data model and used by the IDV. An important aspect of this integration with protocol-specific data models is the ability to augment datasets' "use metadata" with THREDDS. This will give data providers a powerful reason to use THREDDS catalogs. Eventually it will give us the ability to build data servers that can translate from one data access protocol to another, such as WCS.

Work on the CDM includes:

  1. Data Access layer. We have an ongoing effort with the NetCDF, OpenDAP and HDF5 development groups to study those 3 data models and agree on how they are the same and how they differ, with an eye towards making them as compatible as possible in future development, including NetCDF-4.
  2. Data Access layer. We have alpha quality code that reads netCDF-3, HDF5 and OpenDAP into the CDM. We are currently working on GRIB (version 1 and 2), GINI and DMSP (satellite) to validate the feasibility of this approach. We are targeting file formats where we need local access, eg for case studies or the LEAD project.
  3. Coordinate Systems: improved access to WRF model output, and working to read ADAS formatted netCDF files.
  4. Datatypes. Have implemented a "Station" data type to go along with existing "Grid" datatype. We can currently read ADDE station data, as well as MADIS and "Unidata Station format" netCDF files. We can speed up ADDE station access by annotating the THREDDS catalog entry.

OpenDAP collaborations


Status update 6/04/2004

Catalog 1.0 Specification

In conjuction with NCAR Community Data Portal Group (Luca Cinquini, Michael Burek), a significant revision of the catalog specification was completed. This created many new tags for "enhanced catalogs", especially for Digital Library entries.

DQC Specification 0.3

A new version of the Dataset Query Capabilities specification was completed, as part of our ongoing collaboration with the OpenDAP developers community. This will allow us to use a DQC as a front end and eventual replacement for the OpenDAP "File Server" functionality.

Digital Library

We are now writing DIF and ADN Digital Library entries from enhanced catalogs, and working to incorporate them into GCMD and DLESE discovery centers, respectively.

My World GIS Client

The MyWorld client, developed by Danny Edelson and others at the World Watcher Project at Northwestern University, is using the THREDDS UI widgets to access THREDDS catalogs.

THREDDS Server

The THREDDS Server is a servlet-based web application that provides verious services, including catalog viewing, validation, generation, as well as subsetting and annotation. The DODS Aggregation Server, Catalog Generator, and IDD Data server applications are now bundled in a single war file for easy installation of the entire suite of THREDDS services.

THREDDS Clients

The Client widgets are updated to take advantage of the catalog version 1.0 information. All the Client tools and viewer functionality is being packaged in a single webstart-downloadable application, now called the "THREDDS Toolset". Part of this is a new application called the "Catalog Enhancer" that allows remote editing of catalogs with a GUI for adding enhanced information to a catalog.


Status Update 1/27/04

OpenDAP coordination

We met with Peter Cornillon and the OpenDAP developers to develop a strategy for OpenDAP/THREDDS. We agreed to make THREDDS catalogs an automatic part of all OpenDAP servers, eventually replacing the "dods directory" service. We also will investigate using the DQC functionality to replace the OpenDAP "file server" and other functions.

NCAR Data Portal coordination

We are working with the NCAR Data Portal to define common metadata attributes for THREDDS catalogs in order to create Dublin Core and DIF (GCMD) digital library records. These will be included in the next version of the THREDDS Catalog specification.

Web Coverage Service

With the University of Florence, we developed a prototype OGC Web Coverage Service that serves netCDF files. Ongoing work to use latest WCS 1.0 spec and integrate into THREDDS server.

Ongoing technical work


Status Update 9/23/03

 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Office of Programs University Corporation for Atmospheric Research (UCAR)   Unidata is a member of the UCAR Office of Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690