Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

Components of THREDDS

DRAFT DRAFT DRAFT DRAFT

Ben Domenico

Last Modified: September 9, 2003

Overview

The overall goal is to create a set of data servers to serve the Earth system research and education community. While each server will have its own "theme" oriented around the type of data and expertise available at the server site, they will also have a set of common elements that will enable users to access them as a coherent whole. This document describes those common components.

Existing Tools Already in Use in Prototype Server

Unidata LDM and IDD

For delivering realtime data from a variety of sources as well as for automatically exchanging data and metadata among the servers, the Unidata Local Data Manager (LDM) software and Internet Data Distribution (IDD) system will be used. This system has been in use at over 100 universities for about 5 years now. It is finding increasing use in government and commercial sites as well.

netCDF and Decoders

The Unidata LDM can be configured to run decoders which convert incoming data into specific format for local storage. In particular, one set of decoders creates netCDF files which are of particular interest because the netCDF enables the creation of self-describing files which contain metadata which describes the data in the file. The ability to store metadata in the files themselves is useful for building automated tools that facilitate discovery and usage.

DODS and ADDE

Client/server interfaces make it possible to run applications programs locally, but use them to access data from remote servers as if the data were stored on the local disks. Two such interface packages are available for data in the Earth system community.

Tools That Need to Be Incorporated into the Prototype

NetCDF to XML wrapper creator (University of Florence and others)

Discovery and usage metadata can be stored directly in netCDF files because of their self-describing nature. However, it will also be useful to have XML wrappers that contain the metadata and a pointer to the file containing the data. These metadata-only files can then be accessed and manipulated by systems manipulating the metadata without having to deal with each of the data files. Moreover, such XML metadata files can be created for files that are not in netCDF (e.g., McIDAS files served by ADDE) where it is not convenient to store the metadata inside the data file. In this way, tools for generating catalogs of data on the server or more general catalogs such as those maintained by the GCMD (Global Change Master Directory) or by DLESE (Digital Library for Earth System Education) can work with the collection of XML metadata files rather than having to extract the metadata from the datasets themselves. It should be noted here that, on some servers, the metadata information may actually be stored in a database on the server, but the same sort of tools will be needed for extracting the metadata whether it ultimately is stored in a database or a collection of flat files. Finally these tools may run as backends on decoders as the data arrive in real time or they may run as a set of crawlers which periodically traverse the data collection and create (or update) the metadata.

Human input metadata assistance tools (e.g., Metabot, DCBot, DLESE tool, and others)

On some servers, it will not be possible to create all the metadata automatically. For example, the creation of data collections related to case studies of classic examples of geosciences phenomena may require substantial human input into the process. Tools to assist the person generating the metadata will be important at such sites. DLESE is creating such a tool to assist in creating metadata for collections of educational materials. Metabot and DCBot are also available as commercial and shareware products. Similar tools will be needed to assist in the preparation of metadata describing datasets as well.

XSLT translators

Even if all the THREDDS sites agree on XML as the form for serving metadata, it is unlikely that all sites and all collections will utilize the same set of conventions for describing the data in XML. Consequently there will be a need for tools that convert metadata from one set of conventions to another. The W3C has a recommendation for XSL transformations which defines the syntax and semantics of a language for transforming XML documents into other XML documents where XSL is the XML Stylesheet Language. Note that XSLT is designed so it can be used independently of XML stylesheets.

SDLIP (Simple Digital Library Interchange Protocol)

Middleware layer that allows for exchange of metadata among distributed digital libraries.

 

Tools to Be Developed

Automated XML metadata creation tool for datasets other than netCDF

As noted above, many datasets on servers will be in formats that do not lend themselves to storing the metadata in the files themselves. In this case the metadata generation tools will have to store the metadata outside the data file.

GCMD DIF metadata creation tools

The DIF (Data Interchange Format) of the GCMD (Global Change Master Directory) is a useful starting point for providing the metadata needed for data discovery systems. While it is not as complete as the exhaustive FGDC specification, it does offer a mapping to FGDC and also specifies a "required" subset that should be managable for most collections. Adopting DIF as a starting point establishes a connection to one of the primary central catalog sites for Earth system data, so metadata describing the THREDDS servers collections can presumably be included quickly in the GCMD master directory.

Note that in the case of self-describing data files (such as netCDF), the metadata will be stored in the files themselves as well as in XML wrappers. On the other hand, for data in other forms, the data will only be stored in separate XML files or in databases which serve the XML descriptions.

There are groups working on "crosswalks" between sets of metadata standards. Diane Hillman from Cornell has a couple web pages describing some of this work.

NSDL is the National SMETE Digital Library where SMETE is Science, Math, Engineering, Technology Education.

Pattern recognition metadata creation tools

This concept was introduced into THREDDS after a discussion regarding the CRAFT (Collaborative Radar Acquistion Field Test) project. An experimental element of that project is to incorporate automated detection algorithms into the data collection system. In this case the mesocylone detection algorithms are to be applied to data observations from NEXRAD radar systems. In the project "Experiments will be conducted on detection of the onset of rotation, extension of rotations through the storm, and (in severe cases) the appearance of tornado vortex signatures." The idea here is that tools like this could be used to generate metadata relating to specific scientific phenomena and incorporated into the THREDDS servers metadata collection. This would eventually enable users to search for data containing "tornado vortex signatures" as well as the traditional searches by location, time, observing platform, etc. The work at the University of Oklahoma is described in The Mesocyclone Climatology Project. NCAR's Research Applications Program (RAP) is initiating an investigation into automated pattern recognition in scientific imagery. Paul Herzegh is the contact there.

 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Office of Programs University Corporation for Atmospheric Research (UCAR)   Unidata is a member of the UCAR Office of Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690