[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NSDL Metadata for THREDDS dataset

Hi Ted,

One thing that would really be useful at this point is to have some examples of the specifications for your fatter metadata specifications.
From your note, it sounds like you already have FGDC specifications for
your datasets. Are those specifications available somewhere? Can I add references to your data types and the metadata specifcations to the matrix I am constructing?


Of course examples of the metadata files that actually get served up out of the database would be helpful as well. If you eventually go to ISO 19115, many of the rest of us would benefit from concrete examples of real data collections specified in that form.

Clearly the "crosswalks" you mention are crucial to THREDDS, so it would be great to have an example of one that goes from the fatter standard to the thinner ones of NSDL and DLESE.

Let me know. I'd like to build up this collection of real world metadata specs and tools as a basis for face to face discussions in the Spring.

-- Ben

Happy New Year.   Go Buffs!!

--On Wednesday, December 26, 2001 3:09 PM -0700 Ted Habermann <address@hidden> wrote:

Hello all,

As a data provider I must admit that I am somewhat alarmed by the
potential for having to provide multiple metadata representations for
thousands (millions) of datasets. I noted in John Weatherley's seminar on
OAI metadata harvesting that the original source materials were DCXML
files (see http://dublincore.org/documents/2000/07/14/dcmes-xml/ for a
discussion and DTD).  I could easily imagine a situation where I had to
create and maintain these files and a parallel set for FGDC
representations. This is, of course, relatively straightforward in a
world of static metadata. DC seems much more static than FGDC, so maybe
this is not a huge problem. In a dynamic metadata situation where data
providers, data managers, or data processing systems are interacting with
the metadata on essentially random time schedules, seems like it could
turn into a massive file management headache. BTW, John Caron's seminar
suggested that I was going to need a bunch of other XML files hanging
around to define collections. This only adds to the problem.

My approach to avoiding this problem is to try to produce multiple
metadata representations from a single source (in my case a relational
database). The content of that database is essentially FGDC, although I
expect that it will soon migrate to ISO 19115. What's important about
this is that it is a "fatter" standard (it has more stuff). The desire to
have more stuff is what led me to agree with Jeff's earlier e-mail
suggesting that it might be difficult to recover from starting small. In
that case, the problem Jeff and Stefano have discussed becomes one of
revealing different subsets of information from the database in response
to different requests.

In any case, I was driven to explore the DC-FGDC crosswalk in the hope
that I could easily create DC from FGDC (what the heck, it's the day
after Christmas and I'm at work!). I was interested to see that this
crosswalk was not referenced in the big list of crosswalks
(http://www.ukoln.ac.uk/metadata/interoperability/). Is there an obvious
reason for that? My initial efforts are in the attached file. It looks to
me like this crosswalk is rather straightforward. The most serious
omission is the identifier field. As far as I know, FGDC does not include
this concept, unfortunately. Could be added as an extension. I also think
OGC is working on an interesting approach to unique identifiers.

This crosswalk may raise some interesting questions about the list of
metadata elements Ben presented
BTW, the definition of the identifier element is broken in that list).
When one follows the crosswalk to FGDC land, one many times lands in the
middle of a section that has a bunch of required elements that are not
included in DC. This, of course, makes going from DC to FGDC impossible,
but it raises the question of whether NSDL might want to beef up this
list. What good are keywords from a controlled vocabulary if you don't
know what controlled vocabulary it is? or identifiers from a specific
context if you don't know what context it is?

I am a real neophyte in this business, so I could be making some simple
errors. In any case, it is also a rough draft!

Happy New Year to all!
Ted Habermann

NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.