LEAD at Unidata

Status Update, April 10, 2008

Tom Baltzer, Mohan Ramamurthy, Anne Wilson


LEAD at AMS 2008, New Orleans

LEAD had a presence this past January at the 2008 AMS 88th Annual Meeting in New Orleans in the form of a workshop and a special IIPS session.


The workshop was titled "AMS Workshop on Linked Environments for Atmospheric Discovery (LEAD): An Emergent Information Technology Environment for On-Demand, Dynamically Adaptive Interaction with Weather for Research and Education". It was organized by Rich Clark and run by Tom with assistance from Anne. In the workshop, participants invoked a mining orchestration that mined archival radar data covering hurricane Katrina. Participants also ran WRF forecasts over a domain of their choice. Both results were visualized with the IDV.


Sergio Mendez and David Ribes (University of Michigan School of Information) surveyed workshop participants before and after the workshop and wrote a paper assessing participants’ impressions and attitudes, which is viewable here. There was generally a very positive feeling about the power of LEAD, though participants felt it was not quite ready for classroom use.


There was also an IIPS session devoted to LEAD. Tom gave a talk entitled “The LEAD testbed system at the Unidata program center: a medium term online repository of meteorological data”. Anne gave a talk retitled as “Programmatic Population of Scientific Data Repository”. Abstracts and recorded presentation from that session are available here.

LEAD and the TeraGrid

After introduction of the LEAD Fault Tolerance and Recovery service (FTR) in late 2007, reliability of the LEAD system seemed to improve significantly. The FTR is designed to detect when a portion of a given LEAD workflow fails and restart that portion on another TeraGrid resource. During the month of December 2007, reliability of end-to-end LEAD workflows skyrocketed to above 90% for real time workflows. The addition of FTR came in time for the LEAD workshop at the AMS meeting in New Orleans. As the date of the workshop approached, workflow failure rates again began to increase. However, we were able to have a successful workshop and received many very positive comments from the participants. Some participants were able to see workflow components fail and be restarted via FTR to a successful completion, which was a rewarding demonstration.

During the time period beginning in early January, problems with many of the TeraGrid resources LEAD relies upon (particularly GridFTP and GRAM) increased. This exposed an underlying flaw in these elements of the TeraGrid stack causing LEAD workflows to fail at an increasing rate despite the FTR. This became endemic by early February and motivated extensive cross project collaboration to address these problems. The TeraGrid team has been very responsive, providing new releases to support LEAD and other TeraGrid users, but to date, the failure rate is still quite high (in excess of 50%).

LEAD and the WxChallenge 2008

With LEAD's support of WxChallenge of 2007 and the success of supporting 10 universities for that endeavor coupled with the increasing reliability of LEAD workflows experienced in late 2007, the LEAD team agreed to support all the participants of WxChallenge 2008. Unfortunately, the problems with TeraGrid mentioned above resulted in most WxChallenge users abandoning using LEAD for their forecasting efforts. With just a few weeks of the WxChallenge remaining at the time of this writing, it seems unlikely that we will be able to leverage the contest.

The THREDDS Data Repository (TDR)

A service interface for the TDR was designed and written that provides programmatic access to the TDR. The interface is described here. A TDR installation was created on the Unidata LEAD test bed and provided to the LEAD developers at Indiana University. A client code package was also provided to help them write code to communicate with the TDR. They are in the process of integrating the TDR into the LEAD cyberinfrastructure as a place where users can publish content from their personal space.

The THREDDS to LEAD Crosswalk

The crosswalk code was updated to crosswalk special THREDDS keywords to LEAD metadata schema keywords. The immediate benefit is to allow the LEAD orchestration to know how to display some datasets by adding entries to the THREDDS catalogs that describes what visualization tools can be applied, such as for those datasets that can be viewed by the IDV. More generally, arbitrary mappings from THREDDS keywords to LEAD keywords can be handled.

Bringing Radar Data into LEAD

We have had success in bringing in the most recent two days worth of WSR 88-D level II radar data into the LEAD cataloging system. The THREDDS catalog to these data is found here: http://lead.unidata.ucar.edu:8080/thredds/lead/leadradarsl2.html

As part of this, radar coverage information was derived for the level II radar data and encoded into the THREDDS catalog. This allows the LEAD Geo-Gui to define locations of interest and obtaining only the radar data that covers that region.

The Unidata LEAD Test Bed Status

The Unidata LEAD test bed continues to be a primary resource of data for LEAD workflows. This includes:

  • 300 days worth of NAM model data that can be used for initial and/or boundary conditions for current and retrospective runs
  • 45 days worth of 10km ADAS data that can be used for initial conditions for current and retrospective runs
  • 6 months of level II and level III radar data that can be used for data assimilation once the crosswalk of and workflows for this are complete; these data can also be used for comparison of model results with measured results
  • 6 months of Surface, Buoy, Profiler, Raob and aircraft data in OU's proprietary LAPS format that can be used for ADAS data assimilation once the crosswalk of and workflows for this are completed

Grib2 data is now fully supported.

We are exploring how our community can benefit to an even greater extent from this resource.

LEAD Phase 2

The LEAD PIs continue to strategize about securing funding for a continued LEAD deployment facility. (Continued LEAD CS research would be pursued under an OCI CDI initiative, though that would likely not involve Unidata.) Currently we are discussing the novel possibility of creating a consortium of projects with similar scientific goals and technical requirements, logically grouped as a TeraGrid Gateway Resource Provider.

Proposing such an RP facility essentially means proposing a Track II (mid-range high-performance computing) system along with the supporting management infrastructure to run it. This is a radical and ambitious idea. TeraGrid is currently undergoing a planning process for its phase 2 and may undergo significant changes. Thus, the timing for this possibility is good. LEAD PIs are writing a position paper to be used in engaging other like minded organizations them in this idea.

In an effort funded by Microsoft, some LEAD PIs have been involved in discussions with Microsoft to deploy LEAD as a demonstration application on their new multicore architectures, another alternative would be to simply package LEAD as part of the Microsoft Weather workbench plan and give it away. It would be valuable to anybody with a 16 core server and another 16 to 60 core cluster. This would be a pure LEAD solution and not the consortium proposed above, and is being led by Prof. Dennis Gannon at Indiana University.

NASA ROSES Solicitation

A notice of intent (NOI) has been submitted in response to the Hurricane Science Research solicitation (A.16) of the NASA Research Opportunities in Space and Earth Sciences with Mohan as the PI and CO-PIs: Craig Mattocks, Kelvin Droegemeier, Anne Wilson and Tom Baltzer.

The NOI is titled “A satellite data access and visualization system to support hurricane Research”. In it we have proposed to:

        Build a data base that allows users to create Case Studies of Hurricane data to include all Common Data Model (CDM) supported datasets -these data would be served using the Thematic Real-time Environmental Data Services (THREDDS) Data Server (TDS) and the Abstract Data Distribution Environment (ADDE).

        Extend the Common Data Model to provide access to the satellite datasets mentioned in the solicitation (TRMM, Aqua, QuickSCAT, Jason, GOES,NPOES, etc.) providing a common framework for dissemination and using the Integrated Data Viewer (IDV), a common framework for visualization.

        Leverage the Unidata Next Generation Case Studies project to upload and store these datasets, associated metadata, IDV bundles and documentation.

        Extend the MADIS WRF-Var3D DA structures to make use of the CDM interfacing so that these hurricane datasets could be used for data assimilation and numerical model initialization.

        Create a facility to crosswalk the data sets into the Linked Environments for Atmospheric Discovery (LEAD see: http://leadproject.org)such that these datasets will be made available to users of LEAD thus making the data base developed by the proposed project to researchers, educators and students.

The proposal itself is due May 16.