Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

Thematic Real-time Environmental Distributed Data Services
(THREDDS)
Incorporating Real-time Environmental Data and Interactive Analysis Tools Into NSDL

Ben Domenico, John Caron, Ethan Davis, and Robb Kambic
Unidata Program Center
University Corporation for Atmospheric Research

P.O. Box 3000
Boulder, CO 80307
USA

Stefano Nativi
University of Florence - Polo di Prato
Piazza Ciardi, 25
59100 Prato
ITALY

1. Overview

In "The Absorbent Mind," Maria Montessori described education simply and elegantly: "It is not acquired by listening to words, but in virtue of experiences in which the child acts on his environment." On a different level with more detail, the National Science Education Standards describe a learning process based on inquiry: "Inquiry is a multifaceted activity that involves ... using tools to gather, analyze, and interpret data; proposing answers, explanations, and predictions; and communicating the results. " These quotes capture the essence of the interactive data environment that THREDDS will foster.

Each second of each day, observing systems around the globe are gathering data that provide snapshots of almost every measurable aspect of our environment: satellites monitor cloud movements, tmospheric constituents,and the temperature of the land and ocean surfaces. Lightning strikes are recorded as they occur throughout the country. Global positioning system and seismic sensors monitor tiny movements as well as major shifts of the planet's tectonic plates. Modeling programs are being developed that use the current data to forecast future evolution on scales ranging from short-term weather forecasts to very long-term climatic changes.

The goal of this work is to expand the means by which learners -- including students, educators, scientists, and the general public -- can use these vast resources to perform their own inquiries -- to "act on their environment." The figure below, a screen dump from a prototype of one of the THREDDS interactive data analysis and display applications, illustrates a few of the ways in which users can interact with environmental datasets that are accessed from remote servers as if they were on local disks. In this particular instance, the display is a 3D rendering of the jet stream as predicted by a supercomputer model dataset on a server at the National Center for Atmospheric Research (NCAR).

Figure 1: Interactive data analysis and display application. The screen image above was created by software engineer Stuart Wier of the Unidata Program Center MetApps project.

Data collections are a cornerstone of the scientific research and education environment. While the amount and variety of Earth system data are increasing daily, the systems for making these data readily available and useful to the academic community have not kept pace. We envision a framework -- a scientific data web -- that will allow faculty and students to search (in the vocabulary of their particular discipline) for available data and to find them, regardless of where the data reside. But just having the data is not enough. Even the many spectacular pictures generated from datasets available on the web present an essential passive view of what is happening. To interact with the environmental phenomena represented by the data, users need specialized visualization and analysis tools that enable them to manipulate and examine the datasets themselves. They need to create their own visual images, and they must be able to manipulate those images in 3D space and perhaps even "fly" through and around them. It should be possible to move a probe around in the image to see how the temperature or pressure changes with depth in the ocean or height in the atmosphere at different points on the globe. Moreover, it's important to overlay images of data from different sources. For example, at the time of a severe thunderstorm, one might ask how the information about rainfall from a nearby radar site correlate with measurements of stream flows in the local river basin. And, if those measurements indicate a problem is arising, it would be valuable to overlay predictions from forecast (meteorological and hydrological) models. Ultimately it may be important to include demographic information about populations in threatened areas.

As a two-year project with limited resources, THREDDS clearly will not do all of this. However, our goal is to build key components that will make such a system possible and to incorporate them into a working prototype that includes a large number of data providers, a group of interactive tool builders, metadata experts, and representatives of the digital library community. The broad access to data and analysis tools envisioned in the prototype scientific data web will enable educators to work with data in classrooms, scientists to examine and incorporate data from other disciplines, students to explore and test their ideas using the yardstick of data. Indeed, in the end, anyone with Internet access will be able to incorporate scientific data into their everyday lives more easily.

2. Strategy: A Variety of Tools and Data Sources Bound by Metadata Catalogs

2.1 Interactive Data Analysis and Display Tools

The strategic goal of THREDDS is to provide students, educators, and researchers with coherent access to a large collection of real-time and archived datasets from a variety of environmental data sources at a number of distributed server sites. The datasets will be conveniently accessible from a collection of THREDDS-enabled data analysis and display tools.  The arsenal of tools includes web-based "thin" clients" that allow the learner to browse and manipulate data using the processing power on the servers; interactive data analysis applets that can be embedded directly into html educational documents; full "thick" client applications that harness the computing power and flexibility of the user's own workstation while accessing data from a collection of remote servers.

2.1.1. "Thin" Client, Browser-based Analysis and Display Systems

On a superficial level, the browser-accessible data analysis and display tools look similar to the more traditional webs sites that offer a display of images generated from data. There is one important difference: namely, these thin clients enable the user to interact directly with the data by using a set of analysis tools that run on the server. An example of this powerful server-based approach resides at the Climate Data Library of the International Research Institute IRI for Climate Prediction at Lamont Doherty Earth Observatory (LDEO). The Climate Data Library enables interactive analysis of datasets on the server via the INGRID system developed by Benno Blumenthal. A second example is the Live Access Server (LAS) which was developed at the Pacific Marine Environment Laboratory (PMEL) under the direction of Steve Hankin.

2.1.2 Interactive Data Analysis Applets Embedded in Educational Materials

The screen shot below is part of a web page from the collection of interactive WeatherWise (WXWise for short) applets developed by a team led by Tom Whittaker and Steve Ackermann for use in courses at the University of Wisconsin Madison. This particular applet accesses a current infrared satellite image and allows the learner to see how a portion of the image would change if the temperature were higher or lower than it actually is. The learner then is asked to respond to questions at the bottom of the page. It's an excellent illustration of an embedded Java applet that allows for direct interaction with real-time environmental data stored on THREDDS servers.

You can bring up the WeatherWise a pplet itself in a Java-enabled browser by clicking on the image.

Figure 2: Interactive applet embedded in educational module web page.

2.1.3 Fully Interactive "Thick" Client Applications

This animated loop below is a series of screen dumps from a prototype application of the Unidata MetApps project. The loop shows how the user can interact with data on a remote server. The panels on the left show the parameters available in the dataset under investigation -- along with a set of options for viewing the data. The specific data that have been selected for the 3D rendering are views of the jet stream predicted by a supercomputer forecast model run at the National Centers for Environmental Prediction and delivered to a THREDDS server at NCAR via the IDD system. Using the Distributed Ocean Data Systems (DODS) client-server protocol, the application was able to bring across only the subset of the data that was needed for the visualization. The loop illustrates several aspects of the image that were generated by the user manipulating the 3D image with her mouse.

Figure 3: Fully interactive "thick" client application. The image above is another screen dump by Stuart Wier of the Unidata Program Center MetApps project.

2.1.4 Embedding Interactive Data Analysis Applications into Publications

In the long term, the intention is to develop THREDDS capabilities to the point where one can embed pointers to datasets and tools into online publications such as this one. In the meantime, it is still necessary to install some client-side software components on your own computer. But, if you're interested this can be done for the current beta test version of at least one of the client applications. There are two approaches to this. One is to get the full Java application running on your own computer. The other is to use a Java applications startup facility called WebStart. Both approaches are described in a web page by Stuart Wier: http://www.unidata.ucar.edu/staff/wier/index.html

2.2 Distributed Data Sources

Thie schematic below shows how a user running a THREDDS client on a local workstation can access data from a number of distributed servers -- each of which has its own emphasis or "theme." Many of the servers, in turn are populated with environmental data in real time via the Unidata Internet Data Distribution (IDD) system that has been delivering data to nearly a hundred universities for the last seven years. A few of these servers already exist, others are being built, and a couple (the streamflow and demographic data servers) are still in the formative idea stage.

Figure 4. Client data access from distributed data servers

The figure below shows how data from a set of servers can be plotted together in an interactive application. Only the required portions of the datasets are transmitted over the network and the application can allow for the wide variety of spacial and temporal resolutions for each data element. This particular screen image is one frame from an animation showing the evolution of the data over time.

Figure 5: Interactive analyisis and visualizaion of data from distributed servers.

The screen image above was created by Don Murray lead software engineer on the Unidata Program Center MetApps project. The prototype application which generated the image was developed by Unidata in collaboration with the Atmospheric Technology Division at the National Center for Atmospheric Research.

2.3 Metadata Catalogs

At the heart of THREDDS is metadata contained in publishable inventories and catalogs.  Based on the eXtensible Markup Language (XML), these inventories and catalogs can be created in many different ways.  Data providers receiving real-time environmental data are instrumenting decoders to create entries describing data products as they arrive and become part of the data server inventory. Crawlers are being implemented to create inventories by traversing existing retrospective data collections. Since catalogs do not have to reside on the data servers, researchers will be able to create specialized or personal catalogs for research publications that point to datasets residing on several data servers.  Educators will incorporate catalogs of illustrative datasets into educational modules that also include tools for data analysis and visualization.   Just as they now use URLs to point to relevant documents, students will eventually be able to reference datasets and analysis tools related to their research projects.  Since the inventories and catalogs are text-based, they can be "harvested" and indexed into Digital Library for Earth System Education (DLESE) and very likely other digital libraries.

 

Figure 6: Searching distributed data catalogs from within applications programs.

The screen shot above is also from a prototype client data analysis application. It's part of the Unidata MetApps development project. The screen illustrates key aspects of THREDDS data catalog access from within a client application. First of all the popup "Choose DODS Dataset" window enables access to several catalog servers on different machines on the Internet. The lower part of the popup window shows a menu of data items available on one of the servers. This particular catalog has dataset entries arranged three different ways: by variable, by model, and by experiment. The details of the individual catalog entries are not important, but one should note that the words associated with each dataset or collection of datasets can be chosen by the creator of the catalog and that the catalog itself can refer to datasets and collections of datasets on a variety of data servers.

The following image is a screen shot from another MetApps client which depicts a catalog that's automatically generated as real-time weather forecast model data arrives at the motherlode server at NCAR. In this case, the main menu items are the names of the various models and one of the model collections, SST-A, has been opened to show the individual datasets available on the server. In essence, the hierarchical list in this case comprises an inventory of the model output datasets available on the server at the time.

Figure 7: Data server inventory listing as seen in analysis and display tool.

The figure below is a different view of the same catalog shown above seen from within an application accessing the catalog whereas the view below shows the actual XML code for the catalog as seen from within the Internet Explorer browser. If you are viewing this page with a recent version of Internet Explorer, you should be able to look at the current version of the catalog by clicking on either image.

Figure 8. Data server catalog in native XML form

3. The Teams

THREDDS is a highly collaborative project, and this section consists of lists of the partners working with us on the three main areas of THREDDS development:  a set of data provider sites; a group of software developers working on systems for data analysis and display; and a set of metadata experts relating to Earth system data collections.

3.1 Data Providers

The following institutions have agreed to be data-server partners:       

Note that NCAR and SSEC will serve as testbed sites for server-side software. As the project progresses and the common underpinnings are tested at the initial sites, additional sites will be added. Sites under consideration are:

3.2 Client Analysis and Display Tools

The THREDDS prototype will provide examples of a wide variety of working applications that use our metadata framework to find, analyze, and display data from server sites.  This will demonstrate an end-to-end system for data access and visualization. The following developers will incorporate our client-side data-access components (class libraries and metadata access) into their own data manipulation tools:

3.3 Metadata Expertise

As noted earlier, the technological core of this initiative, the crucial component that is being developed now, is a system for adding the semantic description of scientific datasets necessary for data manipulation and discovery. It must interoperate with data providers, data servers, data clients, catalog servers, discovery systems, and other middleware components. Investigators will select key scientific datasets and semantic descriptions developed for an end-to-end demonstration of the utility of this approach. Unidata staff will work closely with DLESE to ensure that the resulting metadata system will interoperate effectively with the National STEM (Science Technology Engineering Math) Digital Library (NSDL).

Partners with whom we will consult on matters of metadata and interoperability are:

4. Acknowledgements

The authors wish to thank the National Science Foundation Division of Undergraduate Education for making this work possible as part of the NSDL initiative under the direction of Lee Zia. Obviously THREDDS is a highly collaborative project, so thanks are in order to all the individuals and organizations who are working with us as collaborative partners. These partners have been cited individually in the article.

5. References

 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Office of Programs University Corporation for Atmospheric Research (UCAR)   Unidata is a member of the UCAR Office of Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690