Current data distribution systems use a variety of "push" and "pull" approaches to the dissemination of scientific data. These range from Web-based facilities that allow a users to browse datasets on the server and download those of interest to the real-time Unidata IDD which allows the user to "subscribe" to certain datastreams whose products are delivered to the users site as soon as they are available from the data source. More recently, client/server alternatives have been developed; these allow the user running an application on a remote server to access data on a remote server as if the datasets were on the local disk. These alternatives are described and compared in more detail below. The THREDDS hybrid alternative is described the THREDDS Overview

Data Server (Pull) Request-based Approach

In this model, the user peruses a catalog of data available at the server site (usually via a Web interface). Having found data of interest, the user then either:

Synopsis

Most data servers now implement some form of catalog browse and search facility that allows users to peruse the holdings via the Web. Some of these servers (e.g., the climate data server at LDEO) also provide a suite of web-accessible analysis and visualization tools that enable the user to work with the data via the WWW. If a user wishes to perform analysis outside the repertoire of the toolset on the server, the alternative is to download the interesting portion of the data to a local system where it can be converted into a form compatible with local analysis and display facilities. Some data servers provide only a modicum of discovery assistance (e.g., the OSO/NCEP server) and no analysis tools whereas others offer a Web interface to a database for data discovery but only thumbnail views of parts of the data for visualization (e.g., the CODIAC server). In the latter cases, the user always has to download the data and perform the analysis and display on local systems.

Pros and Cons

This approach enables users to learn something about the datasets available on a server and even perform analysis and display from a remote computer on the Web. However, the user has to use the tools provided on the server or download the data and provide the conversion, analysis, and display tools for local use. This is often a time consuming process. Moreover, if the user is interested in data from more than one server, she often has to learn several different interfaces, several different set of tools, and/or develop several different conversion programs to work with the data locally.

Unidata (Push) Event-driven System

Unidata has traditionally focussed on providing users with data in streams they subscribe to ahead of time for delivery to their local networks ASAP after the data are available from the source. The Unidata system includes a set of decoders that transform the data into the appropriate forms as they arrive at the local system, where users can analyze and display with applications supported by Unidata.

Synopsis

The Unidata Internet Data Distribution (IDD) system assumes that the user wants to perform the analysis and display on local systems and knows ahead of time which data will be of interest. Using Unidata-provided tools, the user in effect "subscribes" to certain datastreams from a variety of sources. The IDD then delivers the data products as soon as they are available from each source. The Unidata Local Data Manager (LDM) software package which is responsible for relaying the data from the source to the end user sites, can be set up to run decoders automatically upon receipt of certain data products. These decoders convert the data products into a format suitable for analysis with analysis and display packages. Unidata also supplies and supports a number of such analysis and display packages.

Pros and Cons

Once it is set up, the IDD delivers the data products to the each participating site and decodes them into the desired formats with no additional effort on the part of the user. However, it does not help the user who needs retrospective data -- either to fill in data missed in the realtime delivery system or for studies of historical events. Moreover, the amount of data in each datastream is increasing rapidly and several large new streams are being added. This strains the network, computing, and data storage capacity at many sites. Finally, the system requires a significant amount of expertise (and Unix experience) in order to set it up initially. Many sites which could use the data for one course which might only last a semester are not able to cope with the administrative overhead and learning curve or the vast amount of data. We suspect that this is an impediment to use in introductory courses in two-year colleges, for example.

Client/Server Data Access

With this approach, the user runs analysis and display programs on her own computer, but accesses data from a remote data server (or several remote data servers).

Synopsis

Using protocols like those of the DODS (Distributed Oceanographic Data System) from University of Rhode Island or the ADDE (Abstract Distributed Data Environment) from the SSEC at the University of Wisconsin, a user can run a suitably-augmented application on her own local machine while accessing data on remote servers as if the data were on her own local computer. The DODS or ADDE server on the remote system supplies the requested in the form needed by the application.

Pros and Cons

This has the obvious advantage that the user has control over the application running on the local system without having to move all the data of interest onto the local area network. At present, however, there are several disadvantages: