Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

Thematic Realtime Environmental Distributed Data Services
THREDDS

Ben Domenico
Last Updated: August 5, 2003

Mission

For students, educators and researchers to publish, contribute, find, and interact with data relating to the Earth system in a convenient, effective, and integrated fashion. Just as the World Wide Web and digital-library technologies have simplified the process of publishing and accessing multimedia documents, THREDDS will provide needed infrastructure for publishing and accessing scientific data in a similarly convenient fashion.

Motivation

Understanding the environment we live in and how human activities and other natural changes affect it has always been regarded as one of the most important and challenging problems in science. The THREDDS initiative is a key infrastructure component needed to address this challenge. A more detailed description is provided in Environmental Data Challenges.

Vision

(This section is paraphrased from the THREDDS NSDL Collections Proposal.)

Data collections are a cornerstone of environmental research and education. New levels of accessing and using data are now achievable because of evolving technologies, even as the amount and variety of Earth system data are increasing daily. Recent parallel progress in the worlds of scientific data management and education-oriented digital libraries is highlighting a common need to discover widely distributed data sets, and to use unfamiliar data meaningfully with a comprehensive set of analysis tools for:

To address this issue, we envision a prototype scientific data web that will facilitate the publication, discovery, and use of environmental data, just as the World Wide Web has made the publication of and access to textual and multimedia documents simple and straightforward.    To illustrate the point, the image below shows a prototype application for viewing gridded data. The popup window shows the contents of a catalog pointing to various datasets stored by Variable, by Model, and by Experiment. This catalog was pubished on one of several catalog servers shown in the drop down menu. The idea is that the applications, catalogs and datasets can all exist on different computers, but the user conveniently finds and analyzes the data as if it were on the local machine.

On the publication side, scientists who generate specialized data sets, including data created minute-to-minute by automated observing platforms, should be able to add to the web with minimal effort by contributing their data to servers using tools that generate the appropriate metadata for cataloging and data-access facilities.

On the discovery and usage side, broad access to data and analysis tools in this scientific data web will enable scientists to publish data sets and create online publications that point directly to them. It will enable educators to work with data in classrooms, faculty to examine and incorporate data from other disciplines, and students to explore and test their ideas using the data yardstick.  It will provide rich discovery mechanisms, designed to reflect concepts of importance to the education community being served, with data cross-indexed and cross-referenced by multiple themes. We are proposing to build a prototype of this scientific data web, which we are calling THREDDS (Thematic Real-time Environmental Distributed Data Services), as a first step toward achieving this vision.

NSDL Collections Proposal Project Summary

We propose the construction of a prototype system for Thematic Real-time Environmental Distributed Data Services (THREDDS) that will make it possible for educators and researchers to publish, locate, analyze, and visualize a wide variety of environmental data both in their classrooms and in their laboratories. Just as the World Wide Web and digital library technologies have simplified the process of publishing and accessing multimedia documents, THREDDS will provide needed infrastructure for publishing and accessing scientific data in a similarly convenient fashion.

THREDDS will establish both an organizational infrastructure and a software infrastructure.  A team of data providers, software tool developers, and metadata experts will work together to develop a software framework that allows users to publish, find, analyze, and display data residing on remote servers. The software framework, based on a concept of publishable data inventories and catalogs, will tie together a set of technologies already in use in existing, extensive collections of environmental data: client/server data-access protocols from the University of Rhode Island and the University of Wisconsin-Madison, Unidata’s real-time Internet Data Distribution system, the discovery system at the Digital Library for Earth System Education (DLESE), and an extensive set of client visualization tools.

The heart of THREDDS, however, is metadata contained in the publishable inventories and catalogs (PICats).  Based on the eXtensible Markup Language (XML), PICats can be created in many different ways.  Sites receiving real-time environmental data will instrument decoders to create PICats describing data products as they arrive. Crawlers will be implemented to create PICats by traversing existing retrospective data collections. Since PICats do not have to reside on the server with the data, researchers will be able to create PICats for research publications that point to datasets residing on several data servers.  Educators will incorporate PICats of illustrative datasets into educational modules that also include tools for data analysis and visualization.  Indeed students will eventually be able to use PICats to point to datasets related to their research projects, just as they now use URLs to point to relevant documents.   Since they are text-based, PICats can be “harvested” and indexed in digital libraries using specialized tools that make use of the internal structure and semantic content as well as by tools similar to those used by current document search engines.

A large set of committed collaborators will continue to work together on the development and integration of this technology, incorporating it into their data servers, client analysis and display applications, and, ultimately, into the NSDL through DLESE.

In brief, THREDDS represents a broad-based community effort, managed by the Unidata Program, to enable learners, educators, and researchers— regardless of their institution’s size, in-house computer expertise, or academic level—to publish, find, and use current and retrospective environmental data.  In sum, THREDDS moves data publication, discovery, and usage from the arcane (where location, formats, and filename conventions must be known) to the mundane where the underlying complexities are transparent to the users.

Current Approaches to Data Distribution

Current data distribution systems use a variety of “push” and “pull” approaches to the dissemination of scientific data. These range from Web-based facilities that allow users to browse datasets on servers and download those of interest to the real-time Unidata IDD which allows users to “subscribe” to certain datastreams whose products are delivered to the users's sites as soon as they are available from the data source. More recently, client/server alternatives have been developed; these allow users running applications on remote servers to access data on the servers as if the datasets were on local disks. These alternatives are described and compared in more detail in Current Approaches to Data Distribution.

THREDDS Hybrid

The THREDDS approach builds on the strengths of traditional data servers and the Unidata realtime distribution system and provides coherence through the use of:

In THREDDS, the Unidata IDD is used to populate and exchange data and metadata among a number of thematic data servers. Traditional mechanisms for accessing the data are still available, but they are augmented by including the more global discovery system of digital libraries (such as DLESE) as well as remote access protocols such as ADDE and DODS which enable application users running on local computers to access data from remote servers as if the data were actually on disks.

Fundamental Concepts

A portion of the community “pulls” data to:

Universities or centers with powerful systems and themes:

Key Components

Technical Approach

The THREDDS approach builds on the strengths of the community of data providers, visualization tool builders, digital libraries, and metadata experts.  It also provides a mechanism for extending the DLESE discovery system (Sumner et al., 2001)to embrace the metadata in what we refer to as Publishable Inventories and Catalogs (PICats).

To accomplish this, we will build two new essential components: a formal definition for PICats and software to facilitate their use.   As described later, PICats will be built using XML transported via HTTP (i.e., on the Web), and will refer to data sets that are usable via DODS, ADDE, or other direct-access methods. Needed software includes tools to create standards-compliant PICats, plug-ins or server-side visualizers to enable the use of PICats in browsers and components that help developers incorporate PICats into applications.

On the discovery side, we will propose and help construct mechanisms for extending the DLESE discovery system to embrace PICats as standard resources.  This will be done as an experiment guided by the DLESE Data Access Working Group (DAWG).  The experiment will add document types and other elements to effectively characterize PICats in the DLESE metadata framework, refine the PICat-creation tools for DLESE compatibility, implement automatic “harvesting” of PICat metadata, test the system from applications and browsers, and, after an iterative refinement, adopt the methodology.  Approval by the DAWG and DLESE Steering Committees will be needed before a recommendation can be made to the National SMETE Digital Library (NSDL) for adoption at that level.

The THREDDS strategy will facilitate the publication of data sets in a variety of forms since metadata descriptors can be built anywhere to create “virtual aggregations” of data sets and to characterize data sets in meaningful ways (including metadata needed by visualization tools).  This reduces the demand for constructing user-specified files of data, and for disseminating metadata to new users. In addition to automated access via analysis/visualization tools, data users would gain multiple views of the same data collections, each tailored to specific education/research contexts

For a more detailed description of the THREDDS technological approach, see THREDDS Technical Underpinnings.

THREDDS Collaborations

Many institutions have agreed to collaborate with the THREDDS initiative in the context of the NSF NSDL Collections proposal, and most of those institutions are interested in working on the project whether or not the proposal is successful. The partnerships fall into several broad categories:

Resources Needed

Central Development and Coordination

The THREDDS NSDL Collections proposal included funds for:

For Data Provider and Applications Development Collaborations

Many of the collaboration sites will be integrating PICats metadata software into their server systems and into their applications for accessing data and metadata on the servers. A student assistant will facilitate these software integration tasks at each site.

Timetable

Of course the timetable depends critically on the resources required and obtained. The THREDDS NSDL Collections proposal includes a Statement of Work and Milestones Table covering a two-year time frame. If that proposal is not funded, the work will stretch over a longer period, but most of the collaborators are committed to finding a way to continue the effort in any case. Even if the proposal is funded, it is understood that the statement of work and milestones will be updated based on input obtained at each T3F and stakeholders meeting.

Status

Already in Place on UCAR Prototype Server

At this point, several THREDDS server components are operational on the SCD/Unidata server called motherlode.ucar.edu:

Integration with NSDL

THREDDS will watch closely as the NSDL architecture evolves to ensure that the THREDDS system for data publication and access is consistent with it.

A current draft schematic from the NSDL Technical Infrastructure Workgroup looks like this:

The THREDDS work fits into several facets of the NSDL architectural diagram as shown in the schematic below:

THREDDS provides a suite of service tools for data analysis and visualization. Thin clients like LAS and INGRID use browsers as the user interface whereas full-blown applications like MetApps run on local machines but access catalog information and datasets from remote servers. Applets for interaction with data can be incorporated directly into educational modules. Publishable Inventories and Catalogs (PICats) are the entities which serve to tie the datasets to the applications and to the Digital Library collections. As textual entities, the PICats can become part of the LD resource harvesting services—thereby effectively making the distributed datasets themselves part of the libary.

References


See THREDDS References
document.

 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Office of Programs University Corporation for Atmospheric Research (UCAR)   Unidata is a member of the UCAR Office of Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690