Summary
of reviewer questions (indented) relating to NSDL THREDDS proposal #0121623
With
answers composed by Ben Domenico in consultation with collaborating partners
July
2, 2001
A.
COMMENTS/QUESTIONS
As you have seen
from the reviews, the panelists saw your proposal in a favorable light and
this agrees with our general assessment. What I'd like to ask you and your team
to do is to elaborate on the following comments/questions which are derived
from the individual reviews and our analysis. In some cases these can be viewed
as "points of clarification" and in other cases they are somewhat
more substantive. Either way, your addressing them here will help us in our
documentation. In no particular order...
1)
While panelists acknowledged the importance of the technical infrastructure
that would be developed in this project, there was a concern that the
educational "drivers" for the work were not as well described as the
technical aspects. It would be useful for us to have some clear examples of
"usage scenarios" involving students and/or faculty.
There is a growing body of evidence that indicates
the educational importance of access to real-time environmental data. An NSF-sponsored
study done by the American Geophysical Union produced "Shaping the Future
of Undergraduate Earth System Education" (http://www.agu.org/sci_soc/spheres/toc.htm)
which states: "We can improve the teaching of Earth science at all levels
by incorporating new teaching methods using collaborative work, active learning
strategies, computers, and large Earth and space science data sets,"
and recommended that faculty should be able to "provide students with
computer exercises involving both modeling and the analysis of large spatial
and/or temporal data sets." At the middle school level, the formal independent
evaluation of Project Skymath (http://www.unidata.ucar.edu/support/mailinglist/mjd/Skymath.html)
states: "The study of real-time weather data and the use of technology
motivates students to learn mathematics."
DLESE provides a set of educational data
access scenarios at: http://www.dlese.org/usecases/forum.html.
These scenarios were distilled from a set of user "stories" at: http://www.dlese.org/people/workgroups/GUI/scenarios/user_scenarios.html.
Many of these relate to the DISCOVERY and USAGE of datasets which
are important THREDDS objectives; but the THREDDS infrastructure will also
enable a third important class of use cases, namely, users PUBLISHING
catalogs and inventories of datasets for others' use. A brief publishing
scenario would involve a learner (teacher, student, or researcher for that
matter) who has researched a particular topic area (convection for example) and
found a collection of datasets to illustrate the concept. The datasets might
reside on servers at different data provider sites. One site might have data
illustrating convection in the ocean, another atmospheric convection, and yet
another might show convection in the Earth's core or in stellar atmospheres.
Having uncovered this wealth of data, the learner will be able to publish an
article on convection which includes a THREDDS PICat (Publishable Inventory
Catalog) pointing to the data on the different servers and to a set of
THREDDS-instrumented visualization tools that will enable the reader to use the
visualization tools to interact with the data on the various THREDDS servers.
This publication can then become part of a collection of related educational
materials on the learner's web site. It does not have to reside on the servers
at the data provider sites.
With this approach, the data discovery,
usage, and publication scenarios will be simpler to implement and use because
the same discovery system used for educational materials will be used for
finding relevant datasets. With proper coordination, the automated tools for
creating THREDDS PICats may be augmented to generate entries in the DLESE
metadata cataloging system. This coordination will make it possible to
configure a DLESE metadata harvesting system to incorporate the PICats
incorporated into modules at educational web sites. Then, in effect, the
educational materials associated with the datasets become a rich source of
education-oriented metadata for the THREDDS data collections.
In each case, it is crucial to have analysis
and display tools that allow users to work directly with the data once they
find datasets of interest. This is a recurring theme in the DLESE data access
scenarios and is the main reason the THREDDS project places a strong emphasis
on augmenting a set of visualization tools with components that allow them to
access both the metadata via the discovery system and the datasets themselves
via the client/server interfaces. Excellent examples of interactive data
manipulation and visualization applications built into educational material can
be found at the site of one of the THREDDS partner sites at the University of
Wisconsin:
http://profhorn.meteor.wisc.edu/wxwise.
One set of lessons employs the plotting of contours of real-time weather data:
http://profhorn.meteor.wisc.edu/wxwise/contour/index.html
2)
Along these same lines please also describe how the project intends to ensure
that the educational uses of real-time environmental data will be enhanced via
the project's synergistic activities with other organizations/groups, for
example DLESE. How will these "stories of usage" be captured,
evaluated, and promoted?
As the results of the NSDL RFP become known,
we will determine which of the funded projects have overlapping or relevant
complementary objectives, and then contact them to develop plans for eventually
integrating the common elements. As stated in the proposal, we intend to work
with DLESE to ensure that the THREDDS discovery system for datasets and
applications is integrated into the discovery system and stories of usage
developed by DLESE. DLESE also has a very active and involved user community
whose data access needs serve as a guide for THREDDS. One commitment we intend
to make in response to the reviews and to resource limitations is to work with
DLESE and perhaps others to develop plans and a proposal for combining the
THREDDS stakeholders meeting with a DLESE users' community meeting. This will
ensure that THREDDS will satisfy the expressed needs of the DLESE community (as
seen in their use case scenarios) insofar as possible. Similarly, we plan a
future proposal to fund a combined meeting of the THREDDS Technical Task Force
(T3F) with the DLESE Data Access Working Group (DAWG), to ensure that our
technologies are compatible where they overlap. Stated differently, we view the
DLESE community as our user community insofar as data access is concerned.
Depending on the results of our analysis of common objectives with the other
NSDL groups, we will try to include them in these meetings where appropriate.
3)
Based on individual panel member reviews and the panel discussion there is some
concern about whether or not this project will produce "only" a
prototype product or a fully operational system of distributed data servers
with associated open protocols that will enable other data generators/providers
to join in the larger collection effort. In the interests of the larger NSDL
program I believe it is critical that the project aim at the latter goal rather
than the former. From my own read of the proposal it does seem that more than a
prototype is planned, but it would be important for us to get clarification on
this point.
At the end of two years, THREDDS will deliver
a working system of metadata interfaces, component software for implementing
those interfaces on clients and servers, a set of servers run by data providers
which implement the server components, and a set of interactive client data access
and display applications that can access the metadata, discovery systems, and
the datasets themselves. In the process of working with our partners to develop
these elements of THREDDS, we will collaborate with DLESE and Unidata community
representatives to build a testbed collection of PICats and educational
materials, based on DLESE's use case scenarios. While this result will be a
working system at the end of the period of performance, we view these THREDDS
elements as prototypes and testbeds for the other data providers, client
applications, and the much
larger collection of datasets, educational materials, and scientific
publications that will be built on the THREDDS model using the THREDDS
interfaces and software components. Briefly, in two years, THREDDS will be a
usable, working system that will in turn serve as the prototype for the
ever-growing collection that will be built on that foundation over the
subsequent years.
From the point of view of the Unidata
community, THREDDS will be a logical evolution of our current data provision
services, which means that we have to build it as a working, supportable
product. Both the Unidata Users and Policy Committees have endorsed these
developments, so they are part of our long-term strategy. In fact, many Unidata
sites already are experimenting with client server data access systems, using
one of our current supported applications (McIDAS) and the protocol (ADDE) that
it supports. Moreover, as noted in the response to an earlier question, we
intend to work with DLESE to ensure that the resulting THREDDS system is
general enough to satisfy most of the needs of the broader DLESE community as
expressed in their data access use case scenarios.
From a technological point of view, it is
impossible to foresee exactly what features will be completed in the two-year
time frame; this depends on the direction that the governing committee chooses
and the priorities, personnel, and technical advances and difficulties. In this
sense what will be delivered may be considered a prototype of what the system
will eventually become. However, what will be delivered will be engineered for
operational use and will be a solid platform for continued development and
enhancements. We are confident that in two years we can develop the protocols
and an implementation framework with sufficient capability and ease of use that
additional data providers and tool builders will be enticed to participate.
4)
The historical expertise and experience that the Unidata effort and related
long-term activity bring to THREDDS was certainly recognized by panelists.
Along with this comes the tradition of community governance alluded to a number
of times within the proposal. At least one reviewer noted the need for this
tradition to extend to the development of a shared XML framework and vocabulary
to support the overall THREDDS project. Is this a legitimate concern and if so,
how will that shared framework and vocabulary be developed and maintained?
Community agreement on a shared XML
vocabulary is critical and is one of the primary goals of this effort. During
the funding period, we expect to concentrate on a small subset of data types
(possibly gridded model data, satellite images, and certain kinds of point data
collections) that we feel are "ripe" for specification and inclusion
into clients. This will allow us to prototype a metadata specification
framework within which other data types subsequently will be added. These are
XML documents accessible via HTTP, and we and our collaborators will maintain distributed
servers to allow universal access to these files, as well as additions and
changes by authorized users. Once the prototype specification has been
implemented to a useful state, the specification will be vetted with
appropriate committees and workgroups in NSDL, DLESE, and Unidata for
refinement and potential adoption as an NSDL standard. The development and long-term
maintenance of the framework and associated software will build upon Unidata’s
prior experience creating and maintaining the netCDF software library
[www.unidata.ucar.edu/packages/netcdf], which has become a de facto, worldwide
standard for many types of scientific data access.
An important aspect of the prototype
specification will be compatibility or interoperability with the vocabulary
developed at DLESE, enabling a browsing tool to work with both systems. Specifically, the browsing tool
envisioned for use with THREDDS will find relevant resources via the PICats
that have been cataloged in DLESE. This would require the browsing tool working
directly with the DLESE discovery system, or through an
"interoperability" layer such as GDLIP (which may be easier, as this
is envisioned to return a DLESE metadata record in a native XML form.)
5)
In a similar vein there was a general concern raised during the panel
discussion about whether end-user client or browsing tools would be available
in actual data analysis applications that would use THREDDS. How would this be
addressed? Are there plans for generic plug-ins? Or would data providers/contributors
have to offer a dataset-specific visualization or analysis tools? Again,
clarification on this issue would be very helpful to us.
End-user client analysis and display
applications--instrumented to access THREDDS metadata on servers and in central
discovery systems--are deliverables of the THREDDS proposal. The Unidata
MetApps, Virtual Exploratorium (now called the VGEE, visual geophysical
exploration environment), New Media Studio, and
WXWISE tools are examples of such visualization applications. As stated in
their letters of support, the developers of these tools are committed to
working with THREDDS to incorporate the metadata and data access components
into their applications. One member of the Unidata user community has stated
that the THREDDS-augmented MetApps will have a substantial impact “just as
LDM/IDD and the Unidata-supported packages have revolutionized the way
Atmospheric Science is taught.”
Some of these visualization clients will
apply to large classes of data on multiple servers, while others will be
specific to particular types of data. Some of the THREDDS-aware applications to
be delivered will be browser-based, thin clients , and these are likely to be
useful only on certain servers that offer functionality (e.g. INGRID, GDS, and LAS)
for data processing.
Possibly the most important goal of the
THREDDS metadata specification is to enable decoupling of data providers from
data consumers. We envision general-purpose dataset browsing tools as well as
specialized visualization and analysis applications that can be used seamlessly
with any dataset that is described through the metadata framework and that is
accessible through one of the supported data access protocols. While data
providers will likely continue to produce specialized tools for their
communities, they will be largely free of trying to anticipate all of the needs
of others. We expect third parties to produce "value-added" dataset
catalogs and descriptions that focus on specific education and research themes
which may, in some cases, depart radically from the uses envisioned by the data
providers.