|
|
|||
|
||||
Ben Domenico
Last Updated:
August 5, 2003
For students, educators and researchers to publish, contribute, find, and interact with data relating to the Earth system in a convenient, effective, and integrated fashion. Just as the World Wide Web and digital-library technologies have simplified the process of publishing and accessing multimedia documents, THREDDS will provide needed infrastructure for publishing and accessing scientific data in a similarly convenient fashion.
Understanding the environment we live in and how human activities and other natural changes affect it has always been regarded as one of the most important and challenging problems in science. The THREDDS initiative is a key infrastructure component needed to address this challenge. A more detailed description is provided in Environmental Data Challenges.
(This section is paraphrased from the THREDDS NSDL Collections Proposal.)
Data collections are a cornerstone of environmental research and education. New levels of accessing and using data are now achievable because of evolving technologies, even as the amount and variety of Earth system data are increasing daily. Recent parallel progress in the worlds of scientific data management and education-oriented digital libraries is highlighting a common need to discover widely distributed data sets, and to use unfamiliar data meaningfully with a comprehensive set of analysis tools for:
To address this issue, we envision a prototype scientific data web that will facilitate the publication, discovery, and use of environmental data, just as the World Wide Web has made the publication of and access to textual and multimedia documents simple and straightforward. To illustrate the point, the image below shows a prototype application for viewing gridded data. The popup window shows the contents of a catalog pointing to various datasets stored by Variable, by Model, and by Experiment. This catalog was pubished on one of several catalog servers shown in the drop down menu. The idea is that the applications, catalogs and datasets can all exist on different computers, but the user conveniently finds and analyzes the data as if it were on the local machine.

On the publication side, scientists who generate specialized data sets, including data created minute-to-minute by automated observing platforms, should be able to add to the web with minimal effort by contributing their data to servers using tools that generate the appropriate metadata for cataloging and data-access facilities.
On the discovery and usage side, broad access to data and analysis tools in this scientific data web will enable scientists to publish data sets and create online publications that point directly to them. It will enable educators to work with data in classrooms, faculty to examine and incorporate data from other disciplines, and students to explore and test their ideas using the data yardstick. It will provide rich discovery mechanisms, designed to reflect concepts of importance to the education community being served, with data cross-indexed and cross-referenced by multiple themes. We are proposing to build a prototype of this scientific data web, which we are calling THREDDS (Thematic Real-time Environmental Distributed Data Services), as a first step toward achieving this vision.
We propose the construction of a prototype system for Thematic Real-time Environmental Distributed Data Services (THREDDS) that will make it possible for educators and researchers to publish, locate, analyze, and visualize a wide variety of environmental data both in their classrooms and in their laboratories. Just as the World Wide Web and digital library technologies have simplified the process of publishing and accessing multimedia documents, THREDDS will provide needed infrastructure for publishing and accessing scientific data in a similarly convenient fashion.
THREDDS will establish both an organizational infrastructure and a software infrastructure. A team of data providers, software tool developers, and metadata experts will work together to develop a software framework that allows users to publish, find, analyze, and display data residing on remote servers. The software framework, based on a concept of publishable data inventories and catalogs, will tie together a set of technologies already in use in existing, extensive collections of environmental data: client/server data-access protocols from the University of Rhode Island and the University of Wisconsin-Madison, Unidata’s real-time Internet Data Distribution system, the discovery system at the Digital Library for Earth System Education (DLESE), and an extensive set of client visualization tools.
The heart of THREDDS, however, is metadata contained in the publishable inventories and catalogs (PICats). Based on the eXtensible Markup Language (XML), PICats can be created in many different ways. Sites receiving real-time environmental data will instrument decoders to create PICats describing data products as they arrive. Crawlers will be implemented to create PICats by traversing existing retrospective data collections. Since PICats do not have to reside on the server with the data, researchers will be able to create PICats for research publications that point to datasets residing on several data servers. Educators will incorporate PICats of illustrative datasets into educational modules that also include tools for data analysis and visualization. Indeed students will eventually be able to use PICats to point to datasets related to their research projects, just as they now use URLs to point to relevant documents. Since they are text-based, PICats can be “harvested” and indexed in digital libraries using specialized tools that make use of the internal structure and semantic content as well as by tools similar to those used by current document search engines.
A large set of committed collaborators will continue to work together on the development and integration of this technology, incorporating it into their data servers, client analysis and display applications, and, ultimately, into the NSDL through DLESE.
In brief, THREDDS represents a broad-based community effort, managed by the Unidata Program, to enable learners, educators, and researchers— regardless of their institution’s size, in-house computer expertise, or academic level—to publish, find, and use current and retrospective environmental data. In sum, THREDDS moves data publication, discovery, and usage from the arcane (where location, formats, and filename conventions must be known) to the mundane where the underlying complexities are transparent to the users.
The THREDDS approach builds on the strengths of traditional data servers and the Unidata realtime distribution system and provides coherence through the use of:
In THREDDS, the Unidata IDD is used to populate and exchange data and metadata among a number of thematic data servers. Traditional mechanisms for accessing the data are still available, but they are augmented by including the more global discovery system of digital libraries (such as DLESE) as well as remote access protocols such as ADDE and DODS which enable application users running on local computers to access data from remote servers as if the data were actually on disks.

A portion of the community pulls data to:
Universities or centers with powerful systems and themes:

The THREDDS approach builds on the strengths of the community of data providers, visualization tool builders, digital libraries, and metadata experts. It also provides a mechanism for extending the DLESE discovery system (Sumner et al., 2001)to embrace the metadata in what we refer to as Publishable Inventories and Catalogs (PICats).
To accomplish this, we will build two new essential components: a formal definition for PICats and software to facilitate their use. As described later, PICats will be built using XML transported via HTTP (i.e., on the Web), and will refer to data sets that are usable via DODS, ADDE, or other direct-access methods. Needed software includes tools to create standards-compliant PICats, plug-ins or server-side visualizers to enable the use of PICats in browsers and components that help developers incorporate PICats into applications.
On the discovery side, we will propose and help construct mechanisms for extending the DLESE discovery system to embrace PICats as standard resources. This will be done as an experiment guided by the DLESE Data Access Working Group (DAWG). The experiment will add document types and other elements to effectively characterize PICats in the DLESE metadata framework, refine the PICat-creation tools for DLESE compatibility, implement automatic “harvesting” of PICat metadata, test the system from applications and browsers, and, after an iterative refinement, adopt the methodology. Approval by the DAWG and DLESE Steering Committees will be needed before a recommendation can be made to the National SMETE Digital Library (NSDL) for adoption at that level.
The THREDDS strategy will facilitate the publication of data sets in a variety of forms since metadata descriptors can be built anywhere to create “virtual aggregations” of data sets and to characterize data sets in meaningful ways (including metadata needed by visualization tools). This reduces the demand for constructing user-specified files of data, and for disseminating metadata to new users. In addition to automated access via analysis/visualization tools, data users would gain multiple views of the same data collections, each tailored to specific education/research contexts
For a more detailed description of the THREDDS technological approach, see THREDDS Technical Underpinnings.
Many institutions have agreed to collaborate with the THREDDS initiative in the context of the NSF NSDL Collections proposal, and most of those institutions are interested in working on the project whether or not the proposal is successful. The partnerships fall into several broad categories:

The THREDDS NSDL Collections proposal included funds for:
Many of the collaboration sites will be integrating PICats metadata software into their server systems and into their applications for accessing data and metadata on the servers. A student assistant will facilitate these software integration tasks at each site.
At this point, several THREDDS server components are operational on the SCD/Unidata server called motherlode.ucar.edu:
THREDDS will watch closely as the NSDL architecture evolves to ensure that the THREDDS system for data publication and access is consistent with it.
A current draft schematic from the NSDL Technical Infrastructure Workgroup looks like this:

The THREDDS work fits into several facets of the NSDL architectural diagram as shown in the schematic below:

THREDDS provides a suite of service tools for data analysis and visualization. Thin clients like LAS and INGRID use browsers as the user interface whereas full-blown applications like MetApps run on local machines but access catalog information and datasets from remote servers. Applets for interaction with data can be incorporated directly into educational modules. Publishable Inventories and Catalogs (PICats) are the entities which serve to tie the datasets to the applications and to the Digital Library collections. As textual entities, the PICats can become part of the LD resource harvesting servicesthereby effectively making the distributed datasets themselves part of the libary.
See THREDDS References document.
| Contact Us Site Map Search Terms and Conditions Privacy Policy Participation Policy | |||||
|
|||||