Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

Endeavor 5: Distributed, organized collections of digital material

No matter how efficient, reliable, and flexible the LDM/IDD push system becomes, a substantial and growing need for convenient access to data from remote servers will remain, because there are simply too many environmental datasets to disseminate via LDM/IDD. Even if the bandwidth were available, many of our sites do not have the capacity to receive and store the data locally; plus, many sites are interested only in a small subset of episodic data during limited periods.

The need for convenient access to historical data was articulated by the NSF review panel for the current award (Unidata 1998). This resulted in collaborative work with NCAR SCD on the Community Data Portal and in the THREDDS initiative to build working, usable prototypes for seamless access to distributed servers via client/server "pull" mechanisms. In short, THREDDS is rapidly becoming a vital component of our community, providing the middleware to support collaboration, data sharing, user-contributed datasets, case studies, and course materials, and will be the underpinning as we expand to a broader set of disciplines and needs.

 

Extending current activities

Description from the Proposal
Current
Progress
Updated
Objectives
The LDM/IDD delivers real-time data to local computers, where students and faculty can use that data with GEMPAK and McIDAS. The UPC has also collaborated with COMET, JOSS, and the NWS to broaden the use of the COMET Case Study Library and to develop and support tools for format conversions needed to make the current collection of 43 case studies more readily useful. As an indication of the widespread use of these case studies, during a recent five-month period 138 case study datasets, comprising 500 gigabytes of data, were ordered or downloaded by 40 Unidata sites. Funding from NOAA dried up. Work on COMET-type case studies is progressing as resources allow New approach is a set of
"mini case studies" using THREDDS/IDV technology to document datasets.
In addition, Unidata has worked closely with the developers of OPeNDAP and ADDE so that datasets important to our community can be accessed efficiently via the Internet. These mechanisms for data delivery will continue to be enhanced, and we will continue to expand the number of datasets available. We will make these datasets useful to a broader community by enabling their archiving, along with mechanisms for cataloging. Developers of Live Access Server at PMEL have incorporated THREDDS catalog tools into their distribution (about 50 sites). OPeNDAP has committed to doing so also.

Work with OPeNDAP to fuse OPeNDAP data access with THREDDS catalog services.

ADDE services now available via IDV, but THREDDS catalog integration is not complete.

Bringing scientific data into collections is a prominent goal for two major educational digital library efforts, NSDL and DLESE, partly due to increasing recognition of the importance of data in the classroom (Manduca 2002). The UPC is already involved in the data side of digital libraries through THREDDS, funded under the NSDL "collection" track. Unidata will build on that foundation by expanding the breadth of the data collections, incorporating the metadata into digital library catalogs, and working with the community to create education modules and publications. NSDL THREDDS 2G, LEAD, and DLESE Data Services Proposals were successful and the projects are underway. First two DLESE Data Access Working Group (DAWG) meetings have been held and first DLESE Data Services Workshop as well. Incorporating catalogs of scientific data into digital libraries effectivel remains a challenge. The catalogs themselves do not contain enough information about the data for end users and there is some question whether there is currently enough information for curriculum developers to use them effectively.
The IDV currently serves as a testbed for evaluating the usefulness of data server middleware, data and metadata representations, and platform-independent infrastructure for scientific data access. The IDV employs THREDDS metadata catalogs to create menus for the available data, and uses OPeNDAP, ADDE, and HTTP protocols to access subsets and aggregates of data on remote servers. Designers and developers of THREDDS have made rapid progress by working closely with IDV as its end-user application. Unidata has recently completed an OPeNDAP/netCDF aggregation server, created and refined XML-based data structures for data catalogs, automated catalog generators for ADDE servers and OPeNDAP servers, and added access to remote netCDF files through OPeNDAP and HTTP. This paragraph in the proposal is essentially a statement of what was already underway at the time of the proposal. Subsequent to the proposal submission, the DLESE Data Services, THREDDS 2G, and LEAD projects have been funded. One of the main things we have learned from our involvement in these initiatives is that the combination of IDD, Decoders, OPeNDAP, ADDE, THREDDS, and IDV is a very powerful but very complicated system. It is not yet at the point where one can present a coherent picture of how all the pieces fit together and how they can be used. As part of it's reorganization, the UPC has created an Integrated Services group to address this issue and others.
     
     

Users value the result of such infrastructure development, as the community survey analysis (Clark 2002) makes clear:

Approximately 70-80% of both organizational contacts and individual users recognize the value of being able to access both real-time and historical data from a set of Internet data servers using the same analysis and display applications on your desktop computer. Of all current Unidata initiatives, THREDDS is the one seen by the community as having the most obvious and (probably) immediate benefit. Unidata continues to engage traditional and non-traditional partners in discussions for the purpose of expanding THREDDS to include a large variety of data accessible to the user from the local desktop.

New activities augmenting and enhancing the program

Unidata has had a transforming effect on the technology and culture of real-time data access in the atmospheric sciences. Using our experience in this arena, we propose to contribute to the ongoing efforts of many other groups to develop shared, well-structured, on-line data collections that can be discovered, accessed, and interpreted easily. The following objectives underlie Unidata's goal of creating digital holdings that contain well-described data on the Earth system, structured and catalogued for effective remote access:

  • Cataloging the real-time datasets and retrospective data archives important to Unidata's community
  • Categorizing datasets and enabling searches by discovery centers such as digital libraries
  • Standardizing client/server access protocols to allow efficient subsetting, data reduction, and retrieval
  • Standardizing "use" metadata to allow full analysis and manipulation, visualization, and integration with GIS
  • Enabling creation of third-party themed collections and related metadata
  • Enabling aggregations and other logical views of datasets, to provide appropriate dataset granularity
  • Adapting to the evolution of data organization, data access, and data mining methodologies
See the technical status pages for an up to date summary of the progress on the technical side of the THREDDS project. More work is needed on enhanced catalog tool development
Some of the basic technical advances needed for such collections are being pioneered in the THREDDS and OPeNDAP initiatives. Unidata proposes to continue developing data server middleware that allows clients access to archived and real-time data sources. Working with NSDL and DLESE, we also intend to help create metadata standards for the geosciences to allow data to be effectively located and used, especially for educational purposes. NSDL THREDDS 2G and DLESE Data Services proposals were successful and are underway. First two DLESE Data Access Working Group (DAWG) meetings have been held and first DLESE Data Services Workshop as well. Planning is underway for next year's DLESE Data Services Workshop which will be hosted by Unidata.
At the same time, Unidata must take advantage of results of other "Data Grid" projects and middleware initiatives, including the effort to develop and standardize an Open Grid Services Architecture that extends Globus Toolkit Grid services and integrates them with web services technologies. LEAD project funded and is tackling this integration. Unidata is playing a key role in the real-time data, cataloging, and visualization aspects of LEAD. As note elsewhere, participation in LEAD has brought to light the need for better integration of the many products and services Unidata is now involved in.
In the next five years, we propose to develop THREDDS as an exemplary component of NSDL and of CI for the geosciences. This will require supporting the providers of thematic data servers with catalog generation and maintenance software; supporting the developers of client software with catalog access libraries; using real-time flows for the automatic population of structured data collections; developing technology for publishing third-party metadata describing remote collections; and helping to define the metadata practices employed in the scientific digital library community. Some of this is being undertaken in LEAD which will also incorporated local forecast models and data assimilation into the mix of available services. Major gap in the area of giving users control over
"case studies" collections in users' own workspace. This issue needs to be addressed. <<Pointer to white paper.>>
When introducing new technology—even technology that shows considerable promise—developers sometimes encounter difficulty in convincing potential providers and users of the benefits of the technology. Unidata will have unique advantages in surmounting this obstacle. These include its close relations with NSDL and DLESE; established collaborations with major data providers in universities, NOAA and NCAR; and participation in consortia such as the Federation of Earth Science Information Partners.

<<Pointer to collaborators lists>>

Additional collaborators in GIS and CUAHSI communities.

Outreach to the GIS, Hydrology, and Oceanography communities is covered in the Community endeavor description.

It is important to note that the UPC initiated this effort in response to input from user workshops, the Policy and User Committees, and the reviewers of the previous NSF proposal. While it has been developed thus far as a proof of concept with funds from other sources, it has become a key element of the Unidata core. Wider applicability will make it a candidate for supplementary funding.

Regional user workshop focused on remote data access via IDV.

So far, Unidata has been successful in getting funding from sources other than NSF ATM, but there are differing opinions as to whether this is a good idea.

From what we are hearing now, in spite of unprecedentd positive proposal reviews and encouragement of increased funding by the review panel, ATM will not be able to supplement our funding enough to cover these additional responsibility areas when the funding runs out for THREDDS 2G, DLESE Data Services, and so forth.

Consequently, where we go from here in terms of seeking additional funding from sources other than NSF/ATM is one of the most important decisions facing Unidata at this juncture.

     
     

 

 

 

 

 

 

 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Office of Programs University Corporation for Atmospheric Research (UCAR)   Unidata is a member of the UCAR Office of Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690