Unidata Internet Data Distribution:
An Overview of the IDD

Ben Domenico
Unidata Program Center
July 30, 2003 *

Overview

Universities across the nation are transforming their teaching and research efforts through increased use of a rapidly expanding menu of environmental data. With funding from the Division of Atmospheric Sciences (ATM ) of the National Science Foundation (NSF), the Unidata Program is playing a central role in this transformation by enabling universities to employ innovative computing and networking technologies to acquire such data sets in real-time and use them routinely in their classrooms and research labs.

Real-time Data from Multiple Sources

The global montage on the left is a beautiful and dramatic illustration of the types of data the Unidata community works with. The Space Science and Engineering Center (SSEC ) of the University of Wisconsin, Madison, uses real-time data from a variety of ground-based and satellite instruments to create an updated global montage every six hours.

More information about available data can be found in Unidata Data Sources.

Working with the Unidata Program Center (UPC), the Unidata community, comprising over 130 universities, has built a national Internet Data Distribution (IDD ) system. The IDD allows users to "subscribe" to certain sets of data products; IDD servers then deliver the requested data to their local servers as soon as they are available from the source.  With the initial national implementation in 1994, the IDD may have been the original example of Internet "push" technology.  It now appears to provide the reliability, flexibility, and efficiency required by participating institutions. As the underlying Internet technology evolves, we anticipate dramatic increases in the volume of data delivered. We plan to augment the system to better serve disciplines outside the atmospheric sciences disciplines and to incorporate anticipated new networking technologies to minimize the impact of the IDD on the underlying network.

Data Delivery as Soon as Possible

The Unidata community needs to access data as soon as possible after it becomes available. Professors

A Subscription System

A distinguishing characteristic of the IDD is that it allows users to specify in advance which data should be delivered to their local systems. The IDD then delivers the data as soon as they are available. One can think of the IDD as a data subscription service, implemented in such a way that delivery (and often processing) are triggered by external events. Further discussion of the IDD and the more typical data center and satellite broadcast approaches is found in Delivery Alternatives.

A National Team

Since Unidata was founded in the early 1980s as a grass roots effort by universities to gain access to real-time weather data, there have been many examples of community members working together to help each other out and solve problems. The IDD is arguably the best illustration of the Unidata universities acting as a true collaboratory. Without the cooperation and contributions of the more than 100 sites now actively participating in the IDD, there would not have been adequate resources to build this national system.

Community-Based Architecture

To make the overall system scalable to a large number of sites receiving large data products at nearly the same time, the IDD is based on a hierarchical, or fan-out, distribution scheme. The system depends on having enough sites willing and able to relay the data streams to a fixed number of others. So far we have been fortunate to have enough sites with adequate resources to act as relay nodes. The design is flexible enough to allow new data products to be introduced from any node in the system. Everyone can contribute.

The Current IDD System

At present, the IDD is delivering an aggregate total of about 50 gigabytes of data per day to over 130 universities. Generally, users seem satisfied with the performance of the system.

Data Rates and Performance

The Unidata Program center monitors how rapidly data are distributed to each subscriber. This information allows us to identify when reconfigurations may be needed. It also serves as a gauge of the performance of the underlying networks.

Future Directions

In a sense we have created a monster in the IDD. The older satellite broadcast data delivery system had certain built-in limitations; adding to the data stream had fixed costs associated with it, such as adding transponder space and delivering the data to the uplink. The costs of adding to the IDD data stream are not as explicit, so the imagination of the user community is not constrained. While Unidata was implementing the IDD, NASA, NOAA, and others were planning and implementing the next generation of data-gathering facilities. For example, NASA will be gathering data via its Earth Observing System (EOS) facilities. And $4 billion has been spent to modernize the the National Weather Service (NWS), which includes new GOES satellites, now generating high-resolution images, new forecast models of ever-increasing resolution from the National Centers for Environmental Prediction (NCEP), various levels of data from the NEXRAD radars, and other observations to be made available on the NOAAPort system now being implemented by NOAA. Faculty want access to these data sets for both classroom and research purposes. For example, some academic researchers run regional atmospheric and hydrological models on computers at their own institutions; using the higher resolution data to set the initial conditions for these models is an important element in increasing the accuracy with which local storms, tornadoes, and floods can be predicted.

New Strategies

We are actively developing and iinvestigating several related options:

Other Sources of Information


* Initial version of this paper presented at the Science Information Systems Interoperability Conference (SISIC), October 1995.