LEAD Status Report

September 2005

Tom Baltzer
Brian Kelly
Doug Lindholm
Mohan Ramamurthy
Anne Wilson

LEAD 2 Year Site Review Held in July

During the first part of this summer the LEAD project was heavily focused on the project's two year site review. This major review, held at NCSA in July, involved presentations to NSF representatives and a panel of reviewers.

Unidata's major efforts in preparation for this meeting included

significant technical project management efforts, including coordination of the overall LEAD effort for requirements and architecture definition and prototype build out
providing material for the Annual Report, discussing past accomplishments and future plans
soliciting input and providing functional requirements for the Architecture and Implementation Plan document, and
organizing and running meetings to support these efforts.

The panel seemed pleased with LEAD's efforts. Indeed, one outcome is that NSF is providing additional funding for two students: one at CAPS to study dynamically adaptive Numerical Weather Prediction (NWP) and one at Indiana University to study issues pertaining to streaming data. The panel recommended strongly that LEAD focus on use of streaming data and dynamic workflows.

Unidata LEAD Test Bed

The LEAD test bed is being built out to a 40 terabyte storage array that will be a primary storage repository for the LEAD effort and the Unidata Community, housing large quantities of IDD data and LEAD generated data products.

ADAS output in netcdf format is now being ftp'ed to the test bed where it will be cataloged and made publicly available. Additionally, the compute infrastructure is being built out to facilitate NWP by running WRF and Data Assimilation by running the ARPS Data Assimilation System (ADAS) on the test bed. Leveraging this effort, we are also working on getting ADAS to assimilate IDD data for establishing the initial conditions for the WRF runs occurring on the test bed.

Lastly, we are working in collaboration with Millersville University, Howard University and the University of Alabama Huntsville to establish a 3 part ensemble regional forecast using the Algorithm Development and Mining system (ADAM) to determine the location of the forecast, WRF running at MU, HU and UPC and the results being stored and cataloged on the UPC test bed.

Steered Forecasts

As a first step toward dynamic adaptivity, regional forecasts are now being generated 4 times daily using the Weather Research and Forecast (WRF) running in multiprocessor mode on the UPC LEAD test bed. The location for the regional model runs is being steered in a dynamic fashion using the center latitude and longitude location provided in the IDD as determined by an algorithm that processes NAM precipitation forecasts and determines the location of highest 24 hour cumulative predicted precipitation. The results, along with parallel runs of the Workstation Eta are being served via OPeNDAP and cataloged using THREDDS. The top level THREDDS catalog is found at: http://lead.unidata.ucar.edu:8080/thredds/topcatalog.xml.

THREDDS Data Repository

LEAD orchestrations need a large, robust, and reliable storage back end with speedy access in order to stage data and store both intermediate and final results. Along similar lines, it became apparent that the Unidata community could benefit from a storage repository that allowed users to store and retrieve data that would otherwise be lost due to scouring.

Towards this end, the Unidata LEAD team is designing and building the THREDDS Data Repository (TDR).
The TDR is a modular framework for a repository that will

locate storage
move data to that storage
generate a unique ID, a handle to the data
register the data in a name resolver that maps data handles to one or more physical locations
generate metadata if none is provided
crosswalk the metadata to another schema, if desired
update one or more catalogs, if desired.

The goal of the framework is to support a variety of implementations of the modules. This way we hope to provide good functionality for both ends of the user spectrum: LEAD at one end, and a single Unidata community site at the other. For example, we hope to be able to support storage implemented via a mass storage system as well as a UNIX disk on a local area network.

Unidata community users will be able to install this repository on their local file system. It will use THREDDS catalogs to support browsing and querying. Where possible it will use the Common Data Model to retrieve data. We see this as a complement to the recently released THREDDS Data Server (TDS).

TDR development is following an agile model. We will be making frequent small releases. Initially the interface will support three functions: putData, getDataURL (which returns a URL to the data), and getData (which copies the data out of the repository to another location). Later, the framework will be expanded to handle aggregation and subsetting.

We are maintaining an evolving web page to describe the effort and also to communicate with other LEAD team members: http://www.unidata.ucar.edu/projects/LEAD/ThreddsDataRepository.html.

We are targeting the end of September to release Interation 1 of the repository, a very simple implementation that will store and retrieve a file to a UNIX disk, generate a unique ID, use a simple table as a name resolver, copy existing THREDDS metadata to a THREDDS catalog for the repository.

This will also involve development of code to crosswalk from the THREDDS schema to the LEAD schema.

In order to interface with a variety of module implementations, this effort also requires some degree of understanding of existing relevant technology. Thus we are surveying technologies such as Storage Resource Broker (SRB), Storage Resource Manager (SRM), Replica Locater Service (RLS), and Data Replica Service (DRS) to understand their functionality and interfaces. We are also working with NCSA to understand their data moving application, Trebuchet, and to understand issues involved in using a mass store system. Other technologies will likely become known to us along the way.

Publications

Abstracts submitted to AMS:

Data Access and Storage in the LEAD Cyberinfrastructure, by Anne Wilson, Doug Lindholm, and Tom Baltzer

An Architecture for the LEAD Data Repository
by Doug Lindholm, Anne Wilson, and Tom Baltzer

Toward dynamic adaptivity: steering the WRF model on the Unidata LEAD test bed, by Tom Baltzer, Steven R. Chiswell, Ben Domenico, and Mohan Ramamurthy.

EarlyLEAD: A Non-Grid Application of LEAD Capability, by David Fitzgerald, Rahul Ramachandran, Ben Domenico, Richard Clark, Thomas Baltzer, Sen Chiao, and Everette Joseph.

Abstracts submitted to AGU:

Storing, Browsing, Querying, and Sharing Data: the THREDDS Data Repository (TDR)
by Anne Wilson, Doug Lindholm, and Tom Baltzer

Facilitating Interdisciplinary Geosciences and Societal Impacts Research and Education via Dynamically Adaptive, Interoperable Data and Forecast Systems, by Jeff Weber, Ben Domenico, Steve Chiswell, and Tom Baltzer