Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

LEAD Status Report

September 2005

Tom Baltzer
Brian Kelly
Doug Lindholm
Mohan Ramamurthy
Anne Wilson

LEAD 2 Year Site Review Held in July

During the first part of this summer the LEAD project was heavily focused on the project's two year site review. This major review, held at NCSA in July, involved presentations to NSF representatives and a panel of reviewers.

Unidata's major efforts in preparation for this meeting included
The panel seemed pleased with LEAD's efforts. Indeed, one outcome is that NSF is providing additional funding for two students: one at CAPS to study dynamically adaptive Numerical Weather Prediction (NWP) and one at Indiana University to study issues pertaining to streaming data. The panel recommended strongly that LEAD focus on use of streaming data and dynamic workflows.

Unidata LEAD Test Bed

The LEAD test bed is being built out to a 40 terabyte storage array that will be a primary storage repository for the LEAD effort and the Unidata Community, housing large quantities of IDD data and LEAD generated data products.

Hourly ADAS analysis output at 27 km resolution generated by Oklahoma University in netcdf format is now being ftp'ed to the Unidata LEAD test bed where it will be cataloged and made publicly available. Additionally, the compute infrastructure on the Unidata LEAD testbed is being built out to facilitate end-to-end NWP by running the ARPS Data Assimilation System (ADAS) to assimilate local observations in the IDD streams and making WRF model predictions using those ADAS-generated initial and boundary conditions. At present, WRF predictions on the Unidata testbed are being made using an alternate procedure, namely, the WRF Standard Initialization (WRFSI) system, along with the 40-km operational NAM output.

Lastly, we are working in collaboration with Millersville University, Howard University and the University of Alabama Huntsville to establish a 3 part ensemble regional forecast using the Algorithm Development and Mining system (ADAM) to determine the location of the forecast, WRF running at MU, HU and UPC and the results being stored and cataloged on the UPC test bed.

Steered Forecasts

As a first step toward dynamic adaptivity, regional forecasts are now being generated 4 times daily using the Weather Research and Forecast (WRF) running in multiprocessor mode on the UPC LEAD test bed. The location for the regional model runs is being steered in a dynamic fashion using the center latitude and longitude location provided in the IDD as determined by an algorithm that processes NAM precipitation forecasts and determines the location of highest 24 hour cumulative predicted precipitation. The results, along with parallel runs of the Workstation Eta are being served via OPeNDAP and cataloged using THREDDS. The top level THREDDS catalog is found at: http://lead.unidata.ucar.edu:8080/thredds/topcatalog.xml.

THREDDS Data Repository

LEAD orchestrations need a large, robust, and reliable storage back end with speedy access in order to stage data and store both intermediate and final results. Along similar lines, it became apparent that the Unidata community could benefit from a storage repository that allowed users to store and retrieve data that would otherwise be lost due to scouring.

Towards this end, the Unidata LEAD team is designing and building the THREDDS Data Repository (TDR).
The TDR is a modular framework for a repository that will
  1. locate storage
  2. move data to that storage
  3. generate a unique ID, a handle to the data
  4. register the data in a name resolver that maps data handles to one or more physical locations
  5. generate metadata if none is provided
  6. crosswalk the metadata to another schema, if desired
  7. update one or more catalogs, if desired.
The goal of the framework is to support a variety of implementations of the modules. This way we hope to provide good functionality for both ends of the user spectrum: LEAD at one end, and a single Unidata community site at the other. For example, we hope to be able to support storage implemented via a mass storage system as well as a UNIX disk on a local area network.

Unidata community users will be able to install this repository on their local file system. It will use THREDDS catalogs to support browsing and querying. Where possible it will use the Common Data Model to retrieve data.  We see this as a complement to the recently released THREDDS Data Server (TDS).

TDR development is following an agile model.  We will be making frequent small releases.  Initially the interface will support three functions: putData,  getDataURL (which returns a URL to the data), and getData (which copies the data out of the repository to another location).    Later, the framework will be expanded to handle aggregation and subsetting.

We are maintaining an evolving web page to describe the effort and also to communicate with other LEAD team members: http://www.unidata.ucar.edu/projects/LEAD/ThreddsDataRepository.html.

We are targeting the end of September to release Interation 1 of the repository, a very simple implementation that will store and retrieve a file to a UNIX disk, generate a unique ID, use a simple table as a name resolver, copy existing THREDDS metadata to a THREDDS catalog for the repository.

This will also involve development of code to crosswalk from the THREDDS schema to the LEAD schema.

In order to interface with a variety of module implementations, this effort also requires some degree of understanding of existing relevant technology.  Thus we are surveying technologies such as Storage Resource Broker (SRB), Storage Resource Manager (SRM), Replica Locater Service (RLS), and Data Replica Service (DRS) to understand their functionality and interfaces.  We are also working with NCSA to understand their data moving application, Trebuchet, and to understand issues involved in using a mass store system.  Other technologies will likely become known to us along the way.

Publications

Abstracts submitted to AMS:

Data Access and Storage in the LEAD Cyberinfrastructure, by Anne Wilson, Doug Lindholm, and Tom Baltzer

An Architecture for the LEAD Data Repository
by Doug Lindholm, Anne Wilson, and Tom Baltzer

Toward dynamic adaptivity: steering the WRF model on the Unidata LEAD test bed, by Tom Baltzer, Steven R. Chiswell, Ben Domenico, and Mohan Ramamurthy.

EarlyLEAD: A Non-Grid Application of LEAD Capability, by David Fitzgerald, Rahul Ramachandran, Ben Domenico, Richard Clark, Thomas Baltzer, Sen Chiao, and Everette Joseph.

Abstracts submitted to AGU:

Storing, Browsing, Querying, and Sharing Data: the THREDDS Data Repository (TDR)
by Anne Wilson, Doug Lindholm, and Tom Baltzer

Facilitating Interdisciplinary Geosciences and Societal Impacts Research and Education via Dynamically Adaptive, Interoperable Data and Forecast Systems, by Jeff Weber, Ben Domenico, Steve Chiswell, and Tom Baltzer


 

 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Community Programs   Unidata is a member of the UCAR Community Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690