|
|
|||
|
||||
DRAFT DRAFT DRAFT DRAFT
Ben Domenico
Last Modified: September 9, 2003
The overall goal is to create a set of data servers to serve the Earth system research and education community. While each server will have its own "theme" oriented around the type of data and expertise available at the server site, they will also have a set of common elements that will enable users to access them as a coherent whole. This document describes those common components.
Unidata LDM and IDD
For delivering realtime data from a variety of sources as well as for automatically exchanging data and metadata among the servers, the Unidata Local Data Manager (LDM) software and Internet Data Distribution (IDD) system will be used. This system has been in use at over 100 universities for about 5 years now. It is finding increasing use in government and commercial sites as well.
netCDF and Decoders
The Unidata LDM can be configured to run decoders which convert incoming data into specific format for local storage. In particular, one set of decoders creates netCDF files which are of particular interest because the netCDF enables the creation of self-describing files which contain metadata which describes the data in the file. The ability to store metadata in the files themselves is useful for building automated tools that facilitate discovery and usage.
DODS and ADDE
Client/server interfaces make it possible to run applications programs locally, but use them to access data from remote servers as if the data were stored on the local disks. Two such interface packages are available for data in the Earth system community.
- The Distributed Oceanographic Data System (DODS) was created at the University of Rhode Island and is supported at Unidata.
- The Abstract Distributed Data Environment (ADDE) <<< Get reference to ADDE page >>> was developed at the Space Science and Engineering Center (SSEC) at the University of Wisconsin Madison.
NetCDF to XML wrapper creator (University of Florence and others)
Discovery and usage metadata can be stored directly in netCDF files because of their self-describing nature. However, it will also be useful to have XML wrappers that contain the metadata and a pointer to the file containing the data. These metadata-only files can then be accessed and manipulated by systems manipulating the metadata without having to deal with each of the data files. Moreover, such XML metadata files can be created for files that are not in netCDF (e.g., McIDAS files served by ADDE) where it is not convenient to store the metadata inside the data file. In this way, tools for generating catalogs of data on the server or more general catalogs such as those maintained by the GCMD (Global Change Master Directory) or by DLESE (Digital Library for Earth System Education) can work with the collection of XML metadata files rather than having to extract the metadata from the datasets themselves. It should be noted here that, on some servers, the metadata information may actually be stored in a database on the server, but the same sort of tools will be needed for extracting the metadata whether it ultimately is stored in a database or a collection of flat files. Finally these tools may run as backends on decoders as the data arrive in real time or they may run as a set of crawlers which periodically traverse the data collection and create (or update) the metadata.Human input metadata assistance tools (e.g., Metabot, DCBot, DLESE tool, and others)
On some servers, it will not be possible to create all the metadata automatically. For example, the creation of data collections related to case studies of classic examples of geosciences phenomena may require substantial human input into the process. Tools to assist the person generating the metadata will be important at such sites. DLESE is creating such a tool to assist in creating metadata for collections of educational materials. Metabot and DCBot are also available as commercial and shareware products. Similar tools will be needed to assist in the preparation of metadata describing datasets as well.
XSLT translators
Even if all the THREDDS sites agree on XML as the form for serving metadata, it is unlikely that all sites and all collections will utilize the same set of conventions for describing the data in XML. Consequently there will be a need for tools that convert metadata from one set of conventions to another. The W3C has a recommendation for XSL transformations which defines the syntax and semantics of a language for transforming XML documents into other XML documents where XSL is the XML Stylesheet Language. Note that XSLT is designed so it can be used independently of XML stylesheets.
SDLIP (Simple Digital Library Interchange Protocol)
Middleware layer that allows for exchange of metadata among distributed digital libraries.
Automated XML metadata creation tool for datasets other than netCDF
As noted above, many datasets on servers will be in formats that do not lend themselves to storing the metadata in the files themselves. In this case the metadata generation tools will have to store the metadata outside the data file.GCMD DIF metadata creation tools
The DIF (Data Interchange Format) of the GCMD (Global Change Master Directory) is a useful starting point for providing the metadata needed for data discovery systems. While it is not as complete as the exhaustive FGDC specification, it does offer a mapping to FGDC and also specifies a "required" subset that should be managable for most collections. Adopting DIF as a starting point establishes a connection to one of the primary central catalog sites for Earth system data, so metadata describing the THREDDS servers collections can presumably be included quickly in the GCMD master directory.
Note that in the case of self-describing data files (such as netCDF), the metadata will be stored in the files themselves as well as in XML wrappers. On the other hand, for data in other forms, the data will only be stored in separate XML files or in databases which serve the XML descriptions.
There are groups working on "crosswalks" between sets of metadata standards. Diane Hillman from Cornell has a couple web pages describing some of this work.
NSDL is the National SMETE Digital Library where SMETE is Science, Math, Engineering, Technology Education.
Pattern recognition metadata creation tools
This concept was introduced into THREDDS after a discussion regarding the CRAFT (Collaborative Radar Acquistion Field Test) project. An experimental element of that project is to incorporate automated detection algorithms into the data collection system. In this case the mesocylone detection algorithms are to be applied to data observations from NEXRAD radar systems. In the project "Experiments will be conducted on detection of the onset of rotation, extension of rotations through the storm, and (in severe cases) the appearance of tornado vortex signatures." The idea here is that tools like this could be used to generate metadata relating to specific scientific phenomena and incorporated into the THREDDS servers metadata collection. This would eventually enable users to search for data containing "tornado vortex signatures" as well as the traditional searches by location, time, observing platform, etc. The work at the University of Oklahoma is described in The Mesocyclone Climatology Project. NCAR's Research Applications Program (RAP) is initiating an investigation into automated pattern recognition in scientific imagery. Paul Herzegh is the contact there.
| Contact Us Site Map Search Terms and Conditions Privacy Policy Participation Policy | ||||||
|
||||||