After the DODS meetings last week and a few brief conversations at the AMS
meetings this week, I thought it would be useful to summarize the issues
that came up at the DODS meetings that I feel are important from my own
(admittedly limited) THREDDS perspective.
When I get a chance, I'll try to capture this on a web page with all the
relevant links, etc. but I wanted to get it out for discussion (especially
for corrections by others who were at the DODS meetings) before I let it
fall through the cracks.
Have a nice MLK weekend.
Under this heading, I include the discussions regarding what comprises a
dataset, what's an aggregation, what's a catalog, a collection, etc. and
how these relate to files, data objects within files, inventories, lists,
directories, etc. I came away from the meetings with the sense that there
are clear definitions for only a few of these. Within THREDDS, we need to
come up with some working definitions that allow us to work with the data
heirarchy in a systematic fashion. This is somewhat complicated by the
fact that the Digital Library community uses some of the terms, e.g., the
term "collection" in its own fashion.
There is a related THREDDS issue that was not discussed much at the DODS
meetings, namely, that we envision third-party metadata contributions in
the form of "catalogs" that reference files on multiple data servers. But
it means that a given dataset or file can be a member of many heirarchies.
The DODS DDS (Data Descriptor Structure) and DAS (Data Attribute Structure)
will not be sufficient for THREDDS. We have to determine how THREDDS fits
in with externally defined "standards" such as those of ISO, FGDC, OpenGIS,
GCMD, Dublin Core, ESML, etc. Recently we learned of another in the area
of software metadata -- BIDM (basic interoperability data model.) Our data
provider sites are required to conform to some of these standards and the
DL community is adopting Dublin Core with some extensions.
Metadata Creation Tools:
These are needed in the form of crawlers, scanners, and tools to aid
human input. This includes hybrid tools where some of the metadata common
to many datastts is input by hand one time and is then combined
automatically with metadata specific to individual datasets or files. It
is important that such tools be able to traverse data holdings where the
metadata (and perhaps the datasets themselves) are held in databases and
generated on the fly as needed. Some of this work is going on in DODS,
some in the DL community, and some at Unidata. So this is one where
coordination of efforts is needed.
Metadata Presentation Tools:
Several approaches to making metadata available were discussed at the
meeting: DBMS systems, LDAP, simply directory/file systems, full text
indexing facilities. As noted above, it's important for metadata
"harvesting" tools to be able to "traverse" all the metadata at a site --
even though it is made available in different ways.
Third-party Metadata Catalog Servers and the DODS Auxiliary Information
I believe these two concepts can be closely related. Whereas the AIS is
currently viewed as a way of adding a "delta" of metadata to the main
metadata source at the data providers site, the concecpt could be extended
to include sites which serve catalogs of metadata organized in a completely
different fashion. For example, some of the catalogs might point to
collections of datasets on different servers that illustrate different
scientific concepts or collections of datasets on different servers that
relate to certain events: hurricanes, major storms, floods,etc.