INTERNET DATA DISTRIBUTION (IDD)

Unidata Program Center 1
Boulder, Colorado
September 1993

1. INTRODUCTION

Universities across the nation are transforming their teaching and research efforts through increased use of a rapidly expanding menu of environmental data. With funding from the Atmospheric Sciences Division (ATM) of the National Science Foundation (NSF), the Unidata Program is playing, and will continue to play, a central role in this transformation by enabling universities to employ innovative computing and networking technologies to acquire such datasets in real-time and use them routinely in their classrooms and research labs.

The Unidata Program has embarked on another endeavor that promises to deepen and broaden this fundamental transformation. The new Internet Data Distribution (IDD) initiative addresses an issue facing the atmospheric sciences community in the immediate future: how to cope with the immense volume of data scheduled to become available as part of new initiatives in NOAA and other agencies2. As an example, the National Weather Service modernization will soon create a real-time NOAAPort data stream of 2 megabits per second. The concept further enables education-oriented institutions that thus far have lacked the requisite equipment and expertise to integrate the new technologies into their programs gradually. The Unidata Program Center (UPC) will continue to act as a catalyst and facilitator for outreach activities at its member universities.

2. OVERALL GOAL

The concept behind the Unidata IDD is to develop a system for disseminating real-time scientific data which will build on Internet facilities as the underlying mechanism for data distribution and for broadening the community of users who can utilize the information. The system will:

3. CURRENT SYSTEM

At present Unidata uses a commercial satellite broadcast system to disseminate real- time weather data to more than 100 universities. The data all flow to a single point where a commercial vendor provides the uplink to the broadcast satellite. Unidata also distributes software for capturing, analyzing, and displaying the data. While the software distribution is done mainly via the Internet, some tape and diskette copying is still required. Nearly all consulting is done via electronic mail.

The diagram Current System Overview shows how data are disseminated via broadcast while the Internet is used to distribute and support the software and to communicate among community members.

4. NATIONAL SYSTEM

With guidance from the UPC and reassurance that it is participating in an ongoing national program, each Unidata department purchases and maintains its own network of personal computers and workstations. Sites purchase or lease the ground station equipment needed to receive the satellite broadcast and feed it to a local computer. They also subscribe to those data streams of interest to them, paying fees that are discounted through the Unidata contract.

Unidata software, called the Local Data Manager (LDM), which is built on the client-server architecture, allows the site to capture all or part of the subscription data stream and store the data anywhere on the local network. Professors and students in the atmospheric science department utilize a suite of applications programs provided by Unidata to analyze and display the data in their instructional and research programs. The UPC has also developed a set of scripts that allow a site to automatically produce processed products, such as electronic weather maps and graphs, and make them available to users with personal computers and terminals elsewhere on campus or in the region. Originally called the Campus Weather Display, the system has been expanded at some sites to include other types of data and instructional materials. To reflect these new capabilities, the name has been changed to Integrated Earth Information Server (EIES, as in eyey on the globe). For a description of a Unidata site with all these components in place, see Ramamurthy et al., 1992. 2

5. LEVERAGING THE INVESTMENT

The combination of the LDM, powerful analysis and display applications, and the IEIS has allowed institutions like Iowa State to produce value-added electronic weather maps using near-real-time weather data and to send these to local K-12 schools for science education. The University of Illinois and University of Michigan both use Unidata software and data as the basis for Internet-accessible weather information centers that provide menu-driven interactive access to electronic weather maps and reports to what appears to be a voracious audience. The Weather Underground at Michigan regularly serves over 250,000 user accesses per week, and the Weather Machine gopher server at Illinois handled over 100,000 accesses the day Hurricane Emily approached North Carolina. Michigan professor Perry Samson now has an NSF grant to extend his system for use in local K-12 schools. At the University of Colorado, education professor Nancy Songer, working with the Boulder Valley School District's Internet Project, plans to use an IEIS system in her "Kids as Global Scientists" program to introduce students to the excitement of real-time data and the power of network communications in science education. Very recently, the City College of New York received an NSF grant for Project Weatherwatch, which will set up a similar system involving the College of Science, the College of Education, and the New York City Schools system.

Thus, with modest incremental input from the Unidata Program Center, many Unidata universities are spinning off their own regional science education projects which have a major impact beyond the specific research and education activities supported by the Unidata systems in an atmospheric science department.

The shematic Unidata System at Site shows how the components fit together on a Local Area Network in a university department.

6. THE PROBLEM

In spite of Unidata's success in the atmospheric science community, it is still difficult to adapt current systems to provide new kinds of data to all educational institutions that need them. While commercial providers and government agencies are making important contributions in terms of making new data sources available, the current approach requires that raw data be transported to the satellite uplink site to be included in the broadcast. The IDD approach addresses the critical remaining need for a more flexible, affordable data delivery system for the education and research community. Given the need for automated real-time data dissemination on a national scale, existing network facilities (FTP, USENET News polling model, distributed file systems) are inadequate to solve the problem with the required degree of timeliness, automation, and reliability.

7. MODEL INTERNET DISTRIBUTION SYSTEM

With the IDD, the interactions of universities will change dramatically. The client-server architecture of the LDM makes it possible to run ingest and server functions on separate machines. An augmented version of the LDM (dubbed LDM4) is now being tested. It allows each LDM server to act as a data source to another LDM server. Thus data products are relayed from machine to machine, storing some or all of the stream on local disks and relaying data onto machines "down stream." As the diagram Network Distribution Schematic illustrates, the new data distribution systems will resemble the Internet network news system in terms of topology.

The Unidata system will focus primarily on weather-related datasets, including satellite images, radar scan images, hourly observations from international weather reporting stations, vertical atmospheric soundings from balloons and wind profilers, lightning reports, and the output of forecast models run on supercomputers at the National Meteorological Center (NMC) and the European Center for Medium Range Weather Forecasting (ECMWF). However, the system itself is designed to handle most major categories of data that will be available from other observing systems, such as NASA's Earth Observing System Data and Information System (EOSDIS) as well as seismic observations. Hence the system will provide a model for other communities.

Taking advantage of recent progress in wide-area networking, this approach offers a way to improve the ease and reliability of providing scientific data to universities and colleges. An innovative architecture using distributed servers on the Internet and event-driven data distribution mechanisms provide a practical and desirable solution to the problem. The mechanisms will also be useful outside of atmospheric science, since they are designed around a general notion of data products.

Among the important characteristics of this model are the following:

The fan out approach to distributing data from a given source is shown in Data Fan Out via LDM .

Sites that can act as LDM relays--receiving and passing on data--also need to be identified. While it may be advantageous to have the first tier of relay sites located at Internet backbone sites, the overall architecture does not depend on that topology. The UPC has set up criteria for potential LDM relay sites, but is relying on the development of NREN, however, to provide the underlying reliable, high-speed, high-volume network facility.

The data distribution system we envision should have application in any arena requiring real-time data. The oceanography, global change, and seismological communities are examples. Since the data are captured and relayed on the basis of the short identifiers transmitted with the data, modifying the system to handle non-atmospheric data should be easy.

8. THE WORK TO BE DONE

Building a model real-time scientific data dissemination system is a complicated endeavor. Among the most important tasks are:

Work with universities in their efforts to develop systems based on new standard interfaces such as Gopher and World Wide Web to provide data in a form that is useful to people outside the atmospheric science department.

9. CURRENT STATUS

9.1. Software Development

For several months, Unidata has been running a prototype version of the IDD, sending data from NOAA's Forecast System Labs in Boulder to the Unidata Program Center and to several NCAR/UCAR divisions, and from UCAR in Boulder to a donated Sun workstation at NSF headquarters in Washington, D.C. Based on the experience with the prototype, several improvements have been included in the LDM.

The LDM4 version of the software now has facilities for data access authentication as well as improvements for handling slow network links. For data recovery, a manual system will be implemented initially that will require some user intervention.

9.2. Network Management

In terms of overall management, we have received important guidance and consultation from networking experts at Bolt, Beranek, and Newman; Merit; and SURAnet. With current resource constraints, the network management will be handled from the UPC, but alternatives are being investigated. Similarly, for the period of the test, university sites will act as relays although there are some clear advantages to having the relays at regional network operations centers which have full-time support coverage and are well-situated in terms of the underlying network topology.

9.3. Field Test

As of this writing (September 1993), the initial field test is just now beginning with several volunteer university sites acting as relays. For the test, the initial topology for distributing the Domestic Data Plus is shown in the diagram IDD Test Topology for Domestic Data Plus The data are injected into the Internet at the University of Illinois. In the case of the Family of Services data streams, there is no data source on the Internet, so Illinois uses its Alden satellite receiver as the data source.

A similar diagram IDD Test Topology for the Unidata/Wisconsin Data shows the test topology for that data stream. However, in this case the data originates on computers at the Space Science and Engineering Center at the University of Wisconsin-Madison.

10. FUNCTIONS OF IDD SITES

In the Unidata Internet Data Distribution system, there are four types of service that sites can provide to other sites on the Internet:

10.1. Source site

A source site injects data into the IDD system. For the Family

of Services data streams, this function requires a midrange UNIX system with a substantial amount of memory.

10.2. Relay only site

These sites simply relay incoming data to other sites. The requirements for a relay site are similar to those for a source site. However, if they also store a copy of the data for local use, they will also need enough disk space.

10.3. Backup data recovery

In the initial IDD system, a number of data recovery sites will store a copy of the incoming data for other sites to access via FTP in the event they miss some data due to computer or network outages. These sites will need additional disk space.

10.4. Full Integrated Earth Information Server (IEIS) system

The most ambitious use of the Unidata IDD is by sites who not only capture the raw data, relay it to others via the LDM, and decode it for local use, but also generate processed products for redistribution. Such sites provide easy access to text forecast and reports as well as electronic weather maps and other environmental information for non-scientists on campus and in the region. Besides the environmental data, a number of sites are now beginning to integrate instructional materials into the server.

These sites require significant additional processing power and disk storage space for the processed products. However, one of the main advantages of the Unidata software systems is that they need not all run on the same system, so the computing and storage load can be conveniently spread among several workstations.

11. DEPLOYMENT

Once the major software components have been implemented, the UPC can begin to address deployment issues:

As soon as the system is reliably serving the Unidata base of roughly 100 universities, Unidata will re-focus some of its support resources from testing, troubleshooting, and consulting on the new system to begin serving new institutions. If adequate resources are available, this might include universities with a major emphasis on teacher training as well as two-year community colleges. In this endeavor, the UPC will continue to adhere to its informal credo: Undertake no function that can be performed effectively and economically by the universities themselves. One of the best examples of this is our "buddy system," where established sites help new sites get started. This is being expanded at sites that use their own systems to provide processed weather information to other institutions that don't have direct access to Unidata.

Given adequate resources, development and testing of the main software components are to be completed during the first year. Establishing the appropriate agreements and understandings with the data providers will have to proceed in parallel. The deployment phase will involve considerable incremental software development, with a significant shift to more emphasis on documentation, training, consulting, and overall network administration. Most of the existing Unidata sites will begin using the new system during the second year. Subsequent expansion of the new system to include new data sources and a broader community of users will take place in the subsequent years.

12. REGIONAL REDISTRIBUTION

Many universities have expressed a strong interest in providing weather products and curricula materials to other local educational institutions. With Internet connections in place, smaller two-year or undergraduate institutions can join forces with an existing Unidata site; the Unidata site could set up an IEIS to provide them with weather information and instructional materials for use in atmospheric science or general science courses. UPC's experience to date shows that this is possible with a small amount of help from Unidata in modifying the EIES scripts to local conditions. Secondary Distribution to Spoke Nodes is a schematic of the system.

This configuration illustrates how some Unidata sites will not only relay the raw data to their peer LDM sites, but will also function as "hubs" for processing the raw data and redistributing processed data in easy-to-understand forms to sites that do not have Unidata systems installed. The University of Michigan is already engaged in using Unidata systems in this fashion to help teachers prepare classroom materials for K-12 schools with their Weather Underground system. 3

12.1. Network Information Servers

The advent and rapid deployment of network information servers such as Gopher and World Wide Web provide have significantly aided the dissemination of processed products generated by Campus Weather Display/IEIS systems at universities. Gopher client and server software packages are now available for most common computing platforms-including UNIX X Windows, Macintosh, and PCs running DOS or Windows. A university with a Gopher server can use Unidata systems to automatically generate electronic weather maps and store them on the Gopher server. Then anyone network access through a Gopher client can display the latest electronic weather maps on her personal workstation.

The Unidata IDD is an ideal complement to these evolving technologies. The IDD distributes raw data on an event-driven basis -- that is, the data are delivered to the interested sites as soon as they are available. The Unidata systems at those sites can then be used to generate value-added electronic weather maps. These in turn can be made available on a demand basis via a network information server.

13. CONTRIBUTIONS

With funding from the Atmospheric Sciences Division of the National Science Foundation, Unidata is an ongoing program at over 100 participating universities. A staff of 17 at the Unidata Program Center provides the training, support, software updates, and software development needed to maintain and enhance the infrastructure which serves its national community of users. This allows the universities to take advantage of advances in technology and to incorporate new sources of scientific data into their research and education programs. Current Unidata funding from the National Science Foundation Atmospheric Sciences Division supports the UPC, subcontracts to data providers, and offers periodic hardware grant opportunities for the participating universities. This paper provides an updated view of the Internet Data Distribution initiative described in the Unidata: 1993-1998 proposal to NSF ATM. 4 The IDD concept has gained rapid and enthusiastic acceptance among both the participating universities and the agencies providing the data. It represents an opportunity to significantly enhance the technological infrastructure supporting the dissemination of scientific data to the academic community. The speed at which the full system can be developed is very much resource-dependent.

Besides the ongoing support from NSF ATM and the continued contributions of invaluable time from our user community, Unidata has benefited from a contribution from the Forecast Systems Laboratory and in terms of valuable network consulting from Bolt, Beranek and Newman as part of their NSF Network Information Center research effort.

_______________________________

1. The Unidata Program Center is sponsored by the National Science Foundation and managed by the University Corporation for Atmospheric Research. Mention of a commercial company or product does not constitute an endorsement by the Unidata Program Center.

2. Ramamurthy, M.K., K.P. Bowman, B.F. Jewett, J.G. Kemp, and C. Kline, A Networked Desktop Synoptic Laboratory. Bulletin of the American Meteorological Society, Vol. 73, No. 7, July, 1992.

3. Weather as the Paradigm for Instructional Technology. Annual Report of the University of Michigan Weather Underground, June 1993.

4. These data sources are described in Unidata: 1993 to 1998, A Proposal to the National Science Foundation, May 1992, pp. 31-32.