Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

Approaches to Environmental Data Analysis and Display

Ben Domenico

Last Modified: February 10, 2004

The goal of this document is to explain web-based data analysis and display technology to an audience of interested, intelligent, people who really would rather not have to understand all the details of the middleware technology, but need a conceptual understanding of the capabilities such systems can provide for them to get their work done. Think in terms of a scientist or science educator who wants to use environmental data in her research or education programs and would like to understand the concepts and options well enough to make intelligent suggestions for how the technology should evolve.

Traditional Monolithic "Stovepipe" Legacy Data Analysis and Display Applications

All component processes on thes applications programs ran on one computer, accessed data stored on disks on that computer, performed needed computations, created graphical output and displayed it on the screen.

Program Components

Interprocess communication

Program components communicate with one another via function or subroutine calls. Data are passed from one program to another via parameters in these calls. These sets of routines specify an API (Applications Programming Interface). The netCDF library provides an API for data access.

Data Models and Data Formats

Since most legacy program were written by small teams of programmers for a restricted commuity of practice, the forms of data which can be handled are limited to those common to that community. Even then, the datasets have to be stored in a specific format to be useful. In the Unidata world, McIDAS and GEMPAK can both work with point observations from weather stations, vertical soundings, output from forecast models, and satellite imagery, the data have to be stored in GEMPAK form for use in GEMPAK and McIDAS form to be used in McIDAS. In some cases, these programs have been augmented to make use of data in a common format (such as the McIDAS AREA format for imagery), but that is usually the exception rather than the rule with legacy applications.

A common complaint of researchers is that they spend more of their time getting data into forms useful in their analysis tools than they do on the actual research.

Even when the syntax of data access is taken care of my a common API such as the netCDF, the user must have access to external information about the actual meaning (semantics) of the data within the datasets.

How and Why Do Unidata Stovepipe Applications Work?

In the Unidata community, stovepipe applications work well in conjunction with an automatic, event-driven, subscription-based, "push type" data delivery system, the Internet Data Distribution (IDD) system with a set of decoders that automatically transform the data into formats compatible with the analysis and display systems used locally.

This is shown in the following diagram.

For more information about the workings of the IDD, there is a somewhat outdated IDD Overview. Detailed information about the Local Data Manager (LDM) software that is the heart of the community run IDD is available as the LDM Documentation.

Web Access to Pictures Created by Legacy Analysis and Display Applications.

With the advent of the World Wide Web, many organizations involved with data analysis and display applications quickly realized the Web presented an opportunity to make the results of their analysis available to the world (or at least the portion of the world connected to the Internet at the time.) Initially most of them accomplished this by programming their legacy applications to create static pictures of the data which could be incorporated as gif or jpeg images into web pages describing their content. Unidata worked with member universities on systems we then called IEIS for Integrated Earth Information Systems. Many Unidata sites <<<EXAMPLES>>> and now several commercial sites still use this approach to put images depicting real-time weather data on the web. The simplicity of this approach has obvious advantages. The downside is the end user has to be content with the images the server site decides to create and these are just pictures and cannot be subjected to analysis or overlaid with other images to form composite displays.

In this scenario, the legacy analysis and display application components still typically run one one computer, but the application is programmed to run automatically -- triggered either by a scheduler or by the arrival of a particular dataset. The graphical output is stored in the Web server area where the pictures are downloaded as part of a set of web pages.

"Thin" Client, Browser-based Systems

This will describe systems like those at Plymouth State and Illinois that allow browser-based users to specify input parameters for applications like WXP, McIDAS, GEMPAK. The application then runs on the server and returns to the resulting graphics output to the user's browser.

Internet Client/Server Data Access

This will describe systems like the IDV accessing data on servers via OPenDAP protocols or McIDAS accessing data on servers via ADDE protocols. The analysis and rendering is on the local client workstation, but the server provides the data access and, in some cases, aggregating, subsetting, and transformations of the data into a form usable by the client. Historically this approach is based on a combination of Web protocols (maybe just complicated URLs via HTTP) and Internet protocols, e.g., sockets, remote procedure calls, java

Unidata Scenario for Client/Server Usage

In the Unidata community, users are accustomed to having control over their own desktop applications and doing integrated analysis and visualization of datasets that come from a variety of sources.. The client/server appoach enables these users to access datasets from a number of different remote servers while still using a local desktop application. The schematic below shows how the Unidata IDD (Internet Data Distribution system can be used to populate a number of different servers with real-time data. These in turn can be accessed via client/server protocols for local analysis and display.

The Web Services Approach

Web services generally refers to programs that interact with one another by passing XML messages back and forth over the Internet using the http protocol. These messages can contain:

Several groups are attempting to develop standard approaches to the exchange of data via web services. Even in the highly competitive business world, this is seen to be a valuable area for collaboration because there is so much to be gained by having what is essentially a common language for sharing information without opening up internal systems to other businesses. In the world of scientific data, many XML-based markup languages are under development in various disciplines. Examples are MathML for mathematical documents; CML a suite of MLs for chemistry; ESML, Earth System ML; NcML, netCDF ML; and GML, the Geography ML.

Geographic Information Systems (GIS) and Web Services

At the same time, the Open GIS Consortium is developing a set of protocols for exchanging Earth-referenced information. Two of these have been around for some time, the Web Mapping Service (WMS) is mainly used for serving images which contain maps. This approach is most often seen on browser-based, thin-client servers where the user specifies a set of parameters for the map, the server gathers the needed information from a database and returns a map in the browser window. The Web Feature Service (WFS) on the other hand returns a set of GIS features which are points, line, and polygons representing a set of geolocated features or objects on the surface of the Earth. These are the "vector" objects that traditional GIS systems work with, so a thick GIS client might access this sort of vector data from a remote WMS.

The GIS-generated map above display several sets of features described in the legend. One can see how the feature classes lend themselves to storage in a relational database where the each class (e.g., highways) is stored in a table whose records contain the attributes for that specific feature.

This approach serves well for data in the solid Earth sciences and hydrology where most of the datasets are related to discrete objects: characteristics of land surface areas, streams, rivers, and human-built infrastructure. The various classes of features can be displayed as a set of "layers" which can be turned on and off in the manner that transparent mylar layers can be overlaid on a base map.

A Different Way of Thinking About Data -- Data Models in Atmospheric Science

However, scientists in physical oceanography and atmospheric science deal with fluids. Consequently they tend to think of data as discrete points in a continuous function space where many parameters (e.g., temperature, pressure, wind speed and direction) vary in three spatial dimensions and time.. They also describe the behavior of these systems in terms of solutions to the equations of fluid dynamics and have developed elaborate forecast models that attemp to predict the behavior of the systems into the future. The voluminous datasets output by supercomputer forecast modeling programs represent what is arguably the extreme case of a data model distinct from that employed in GIS. The structure of these datasets is quite different from that of the GIS layers of features, and cannot be successfully modeled by the formal database structures.

The diagram above shows the output of a supercomputer forecast model with a depiction of the evolution of the jet stream redered with the Unidata Integrate Data Viewer (IDV). It illustrates a 3-D, time-varying isosurface of the wind speed with a vertical cross section shown as contoures. Interestingly the underlying basemap was actually generated from a GIS based shapefile of features. Pressure field contours are also shown on the base map. These are the sorts of datasets that are challenging to represent in the traditional GIS model and visualization system.

Bridging the GAP With the Web Services Approach

There are a host of situations in which it is of paramount importance to access, analyze and display data from different sources in an integrated analysis and visualization system. Flash floods, landslides, fire weather, climate change impacts, contaminant plume dispersion are just a few instances where datasets from the environmental sciences and GIS need to be analyzed together -- and often in a real-time setting where forecast information is crucial.

<<< Work with David Maidment and others on some "So what?" scenarios for this section.>>>>

The following schematic shows one way to accomplish this integrated approach:

Both the GIS and Atmospheric Science Communities have developed client/server protocols. This enables them to share data within their own communities. If both communities can now move toward a common set of protocols for exchanging data, they can each expand their reach so they can deliver and access datasets from the other community in a form useful in their own applications.

The Open GIS protocols running in a Web Services environment provide such an interchange mechanism. But, as noted above, just exchanging the datasets is not enough. The key to having interoperable systems is to develop a data model that encompasses the main characteristics of the datasets in the two communities. At the moment, the best approach for doing so is in the evolving OGC Web Coverage Service (WCS) specification. In OGC terminology, coverages are used to represent gridded data. Many of the datasets in the atmospheric sciences are represented as grids of data in three-space. The GIS community also has mechanisms for dealing with grids (although the traditional GIS community does not call them coverages). In the Unidata THREDDS project, the approach we are taking is shown schematically in the following diagram.

 

By building a WCS middleware wrapper around the OPeNDAP, ADDE, and THREDDS services, the data can be made available via standard protocols. Then, on the client side, desktop applications need only implement the WCS interface to gain access to the data on this variety of servers. If the GIS community also moves in the direction of these emerging standard protocols and establishes a WCS layer over proprietary protocols such as ArcIMS, the world of client-side data access would be further simplified.

Pros and Cons

There are several clear advantages to this approach:

On the other hand there remain significant challenges:

Our current approach is guided by the following thoughts:

<<<Put in a reference to Stefano's materials here for a more rigorous, in-depth treatment of the issues.>>>

The Grid

This section will describe the way the Grid is evolving to incorporate web services as soon as I learn how the Grid is evolving to incorporate web services. It is probably the best place to discuss orchestration or choreography of services.

 

 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Community Programs   Unidata is a member of the UCAR Community Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690