Summary of reviewer questions (indented) relating to NSDL THREDDS proposal #0121623

With answers composed by Ben Domenico in consultation with collaborating partners

July 2, 2001

A. COMMENTS/QUESTIONS

As you have seen from the reviews, the panelists saw your proposal in a favorable light and this agrees with our general assessment. What I'd like to ask you and your team to do is to elaborate on the following comments/questions which are derived from the individual reviews and our analysis. In some cases these can be viewed as "points of clarification" and in other cases they are somewhat more substantive. Either way, your addressing them here will help us in our documentation. In no particular order...

1) While panelists acknowledged the importance of the technical infrastructure that would be developed in this project, there was a concern that the educational "drivers" for the work were not as well described as the technical aspects. It would be useful for us to have some clear examples of "usage scenarios" involving students and/or faculty.

There is a growing body of evidence that indicates the educational importance of access to real-time environmental data. An NSF-sponsored study done by the American Geophysical Union produced "Shaping the Future of Undergraduate Earth System Education" (http://www.agu.org/sci_soc/spheres/toc.htm) which states: "We can improve the teaching of Earth science at all levels by incorporating new teaching methods using collaborative work, active learning strategies, computers, and large Earth and space science data sets," and recommended that faculty should be able to "provide students with computer exercises involving both modeling and the analysis of large spatial and/or temporal data sets." At the middle school level, the formal independent evaluation of Project Skymath (http://www.unidata.ucar.edu/support/mailinglist/mjd/Skymath.html) states: "The study of real-time weather data and the use of technology motivates students to learn mathematics."

DLESE provides a set of educational data access scenarios at: http://www.dlese.org/usecases/forum.html. These scenarios were distilled from a set of user "stories" at: http://www.dlese.org/people/workgroups/GUI/scenarios/user_scenarios.html. Many of these relate to the DISCOVERY and USAGE of datasets which are important THREDDS objectives; but the THREDDS infrastructure will also enable a third important class of use cases, namely, users PUBLISHING catalogs and inventories of datasets for others' use. A brief publishing scenario would involve a learner (teacher, student, or researcher for that matter) who has researched a particular topic area (convection for example) and found a collection of datasets to illustrate the concept. The datasets might reside on servers at different data provider sites. One site might have data illustrating convection in the ocean, another atmospheric convection, and yet another might show convection in the Earth's core or in stellar atmospheres. Having uncovered this wealth of data, the learner will be able to publish an article on convection which includes a THREDDS PICat (Publishable Inventory Catalog) pointing to the data on the different servers and to a set of THREDDS-instrumented visualization tools that will enable the reader to use the visualization tools to interact with the data on the various THREDDS servers. This publication can then become part of a collection of related educational materials on the learner's web site. It does not have to reside on the servers at the data provider sites.

With this approach, the data discovery, usage, and publication scenarios will be simpler to implement and use because the same discovery system used for educational materials will be used for finding relevant datasets. With proper coordination, the automated tools for creating THREDDS PICats may be augmented to generate entries in the DLESE metadata cataloging system. This coordination will make it possible to configure a DLESE metadata harvesting system to incorporate the PICats incorporated into modules at educational web sites. Then, in effect, the educational materials associated with the datasets become a rich source of education-oriented metadata for the THREDDS data collections.

In each case, it is crucial to have analysis and display tools that allow users to work directly with the data once they find datasets of interest. This is a recurring theme in the DLESE data access scenarios and is the main reason the THREDDS project places a strong emphasis on augmenting a set of visualization tools with components that allow them to access both the metadata via the discovery system and the datasets themselves via the client/server interfaces. Excellent examples of interactive data manipulation and visualization applications built into educational material can be found at the site of one of the THREDDS partner sites at the University of Wisconsin:
http://profhorn.meteor.wisc.edu/wxwise. One set of lessons employs the plotting of contours of real-time weather data:
http://profhorn.meteor.wisc.edu/wxwise/contour/index.html

 

2) Along these same lines please also describe how the project intends to ensure that the educational uses of real-time environmental data will be enhanced via the project's synergistic activities with other organizations/groups, for example DLESE. How will these "stories of usage" be captured, evaluated, and promoted?

As the results of the NSDL RFP become known, we will determine which of the funded projects have overlapping or relevant complementary objectives, and then contact them to develop plans for eventually integrating the common elements. As stated in the proposal, we intend to work with DLESE to ensure that the THREDDS discovery system for datasets and applications is integrated into the discovery system and stories of usage developed by DLESE. DLESE also has a very active and involved user community whose data access needs serve as a guide for THREDDS. One commitment we intend to make in response to the reviews and to resource limitations is to work with DLESE and perhaps others to develop plans and a proposal for combining the THREDDS stakeholders meeting with a DLESE users' community meeting. This will ensure that THREDDS will satisfy the expressed needs of the DLESE community (as seen in their use case scenarios) insofar as possible. Similarly, we plan a future proposal to fund a combined meeting of the THREDDS Technical Task Force (T3F) with the DLESE Data Access Working Group (DAWG), to ensure that our technologies are compatible where they overlap. Stated differently, we view the DLESE community as our user community insofar as data access is concerned. Depending on the results of our analysis of common objectives with the other NSDL groups, we will try to include them in these meetings where appropriate.

3) Based on individual panel member reviews and the panel discussion there is some concern about whether or not this project will produce "only" a prototype product or a fully operational system of distributed data servers with associated open protocols that will enable other data generators/providers to join in the larger collection effort. In the interests of the larger NSDL program I believe it is critical that the project aim at the latter goal rather than the former. From my own read of the proposal it does seem that more than a prototype is planned, but it would be important for us to get clarification on this point.

At the end of two years, THREDDS will deliver a working system of metadata interfaces, component software for implementing those interfaces on clients and servers, a set of servers run by data providers which implement the server components, and a set of interactive client data access and display applications that can access the metadata, discovery systems, and the datasets themselves. In the process of working with our partners to develop these elements of THREDDS, we will collaborate with DLESE and Unidata community representatives to build a testbed collection of PICats and educational materials, based on DLESE's use case scenarios. While this result will be a working system at the end of the period of performance, we view these THREDDS elements as prototypes and testbeds for the other data providers, client applications, and the  much larger collection of datasets, educational materials, and scientific publications that will be built on the THREDDS model using the THREDDS interfaces and software components. Briefly, in two years, THREDDS will be a usable, working system that will in turn serve as the prototype for the ever-growing collection that will be built on that foundation over the subsequent years.

From the point of view of the Unidata community, THREDDS will be a logical evolution of our current data provision services, which means that we have to build it as a working, supportable product. Both the Unidata Users and Policy Committees have endorsed these developments, so they are part of our long-term strategy. In fact, many Unidata sites already are experimenting with client server data access systems, using one of our current supported applications (McIDAS) and the protocol (ADDE) that it supports. Moreover, as noted in the response to an earlier question, we intend to work with DLESE to ensure that the resulting THREDDS system is general enough to satisfy most of the needs of the broader DLESE community as expressed in their data access use case scenarios.

From a technological point of view, it is impossible to foresee exactly what features will be completed in the two-year time frame; this depends on the direction that the governing committee chooses and the priorities, personnel, and technical advances and difficulties. In this sense what will be delivered may be considered a prototype of what the system will eventually become. However, what will be delivered will be engineered for operational use and will be a solid platform for continued development and enhancements. We are confident that in two years we can develop the protocols and an implementation framework with sufficient capability and ease of use that additional data providers and tool builders will be enticed to participate.

4) The historical expertise and experience that the Unidata effort and related long-term activity bring to THREDDS was certainly recognized by panelists. Along with this comes the tradition of community governance alluded to a number of times within the proposal. At least one reviewer noted the need for this tradition to extend to the development of a shared XML framework and vocabulary to support the overall THREDDS project. Is this a legitimate concern and if so, how will that shared framework and vocabulary be developed and maintained?

Community agreement on a shared XML vocabulary is critical and is one of the primary goals of this effort. During the funding period, we expect to concentrate on a small subset of data types (possibly gridded model data, satellite images, and certain kinds of point data collections) that we feel are "ripe" for specification and inclusion into clients. This will allow us to prototype a metadata specification framework within which other data types subsequently will be added. These are XML documents accessible via HTTP, and we and our collaborators will maintain distributed servers to allow universal access to these files, as well as additions and changes by authorized users. Once the prototype specification has been implemented to a useful state, the specification will be vetted with appropriate committees and workgroups in NSDL, DLESE, and Unidata for refinement and potential adoption as an NSDL standard.  The development and long-term maintenance of the framework and associated software will build upon Unidata’s prior experience creating and maintaining the netCDF software library [www.unidata.ucar.edu/packages/netcdf], which has become a de facto, worldwide standard for many types of scientific data access.

An important aspect of the prototype specification will be compatibility or interoperability with the vocabulary developed at DLESE, enabling a browsing tool to work with both systems.  Specifically, the browsing tool envisioned for use with THREDDS will find relevant resources via the PICats that have been cataloged in DLESE. This would require the browsing tool working directly with the DLESE discovery system, or through an "interoperability" layer such as GDLIP (which may be easier, as this is envisioned to return a DLESE metadata record in a native XML form.)

5) In a similar vein there was a general concern raised during the panel discussion about whether end-user client or browsing tools would be available in actual data analysis applications that would use THREDDS. How would this be addressed? Are there plans for generic plug-ins? Or would data providers/contributors have to offer a dataset-specific visualization or analysis tools? Again, clarification on this issue would be very helpful to us.

End-user client analysis and display applications--instrumented to access THREDDS metadata on servers and in central discovery systems--are deliverables of the THREDDS proposal. The Unidata MetApps, Virtual Exploratorium (now called the VGEE, visual geophysical exploration environment), New Media Studio, and WXWISE tools are examples of such visualization applications. As stated in their letters of support, the developers of these tools are committed to working with THREDDS to incorporate the metadata and data access components into their applications. One member of the Unidata user community has stated that the THREDDS-augmented MetApps will have a substantial impact “just as LDM/IDD and the Unidata-supported packages have revolutionized the way Atmospheric Science is taught.”

Some of these visualization clients will apply to large classes of data on multiple servers, while others will be specific to particular types of data. Some of the THREDDS-aware applications to be delivered will be browser-based, thin clients , and these are likely to be useful only on certain servers that offer functionality (e.g. INGRID, GDS, and LAS) for data processing. 

Possibly the most important goal of the THREDDS metadata specification is to enable decoupling of data providers from data consumers. We envision general-purpose dataset browsing tools as well as specialized visualization and analysis applications that can be used seamlessly with any dataset that is described through the metadata framework and that is accessible through one of the supported data access protocols. While data providers will likely continue to produce specialized tools for their communities, they will be largely free of trying to anticipate all of the needs of others. We expect third parties to produce "value-added" dataset catalogs and descriptions that focus on specific education and research themes which may, in some cases, depart radically from the uses envisioned by the data providers.