[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NSDL Metadata for THREDDS dataset



Hello all,

As a data provider I must admit that I am somewhat alarmed by the potential for 
having to provide
multiple metadata representations for thousands (millions) of datasets. I noted 
in John Weatherley's
seminar on OAI metadata harvesting that the original source materials were 
DCXML files (see
http://dublincore.org/documents/2000/07/14/dcmes-xml/ for a discussion and 
DTD).  I could easily
imagine a situation where I had to create and maintain these files and a 
parallel set for FGDC
representations. This is, of course, relatively straightforward in a world of 
static metadata. DC
seems much more static than FGDC, so maybe this is not a huge problem. In a 
dynamic metadata
situation where data providers, data managers, or data processing systems are 
interacting with the
metadata on essentially random time schedules, seems like it could turn into a 
massive file
management headache. BTW, John Caron's seminar suggested that I was going to 
need a bunch of other
XML files hanging around to define collections. This only adds to the problem.

My approach to avoiding this problem is to try to produce multiple metadata 
representations from a
single source (in my case a relational database). The content of that database 
is essentially FGDC,
although I expect that it will soon migrate to ISO 19115. What's important 
about this is that it is a
"fatter" standard (it has more stuff). The desire to have more stuff is what 
led me to agree with
Jeff's earlier e-mail suggesting that it might be difficult to recover from 
starting small. In that
case, the problem Jeff and Stefano have discussed becomes one of  revealing 
different subsets of
information from the database in response to different requests.

In any case, I was driven to explore the DC-FGDC crosswalk in the hope that I 
could easily create DC
from FGDC (what the heck, it's the day after Christmas and I'm at work!). I was 
interested to see
that this crosswalk was not referenced in the big list of crosswalks
(http://www.ukoln.ac.uk/metadata/interoperability/). Is there an obvious reason 
for that? My initial
efforts are in the attached file. It looks to me like this crosswalk is rather 
straightforward. The
most serious omission is the identifier field. As far as I know, FGDC does not 
include this concept,
unfortunately. Could be added as an extension. I also think OGC is working on 
an interesting approach
to unique identifiers.

This crosswalk may raise some interesting questions about the list of metadata 
elements Ben presented
(http://www.smete.org/nsdl/workgroups/standards/current_element_set.html BTW, 
the definition of the
identifier element is broken in that list). When one follows the crosswalk to 
FGDC land, one many
times lands in the middle of a section that has a bunch of required elements 
that are not included in
DC. This, of course, makes going from DC to FGDC impossible, but it raises the 
question of whether
NSDL might want to beef up this list. What good are keywords from a controlled 
vocabulary if you
don't know what controlled vocabulary it is? or identifiers from a specific 
context if you don't know
what context it is?

I am a real neophyte in this business, so I could be making some simple errors. 
In any case, it is
also a rough draft!

Happy New Year to all!
Ted Habermann

Title: DC- FGDC Crosswalk

DC- FGDC Crosswalk

DC Element FGDC ???
Title: A name given to the resource. 1.1.8.4 Title -- the name by which the data set is known.
Creator: An entity primarily responsible for making the content of the resource. 1.1.8.1 Originator -- the name of an organization or individual that developed the data set. If the name of editors or compilers are provided, the name must be followed by "(ed.)" or "(comp.)" respectively.
Subject: Typically, a Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. 1.6.1.2 Theme Keywords
1.6.3.2 Stratum Keywords
Description: An account of the content of the resource. 1.2.1 Abstract -- a brief narrative summary of the data set.
1.2.2 Purpose -- a summary of the intentions with which the data set was developed.
1.2.3 Supplemental Information -- other descriptive information about the data set.
Contributor: An entity responsible for making contributions to the content of the resource. 2.5.1 Source Information -- list of sources and a short discussion of the information contributed by each.
Publisher: An entity responsible for making the resource available 1.1.8.8.2 Publisher -- the name of the individual or organization that published the data set.
6.1 Distributor -- the party from whom the data set may be obtained.
Date: A date associated with an event in the life cycle of the resource. 1.1.8.2 Publication Date -- the date when the data set is published or otherwise made available for release.
Type: The nature or genre of the content of the resource. 8.6 Geospatial Data Presentation Form -- the mode in which the geospatial data are represented. Potential Controlled Vocabulary Problems.
Format: The physical or digital manifestation of the resource. 6.4.2.1.1 Format Name -- the name of the data transfer format.
Identifier: An unambiguous reference to the resource within a given context.
Source: A Reference to a resource from which the present resource is derived. 1.1.8.11 Larger Work Citation -- the information identifying a larger work in which the data set is included.
Language: A language of the intellectual content of the resource. 1.2.3 Supplemental Information -- other descriptive information about the data set.
Relation: A reference to a related resource. These relations are expressed by qualifiers: IsVersionOf, HasVersion, IsReplacedBy, Replaces, Requires, IsPartOf, HasPart, IsReferencedBy, IsFormatOf, HasFormat Many of these relations could end up being expressed as parts of a lineage chain in section 2.5. Others would be expressed as part of the Larger Work Citation (1.1.8.11).
Coverage: The extent or scope of the content of the resource. Coverage will typically include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). 1.6.2.2 Place Keywords
1.6.4.2 Temporal Keywords
Rights:  1.7 Access Constraints -- restrictions and legal prerequisites for accessing the data set. These include any access constraints applied to assure the protection of privacy or intellectual property, and any special restrictions or limitations on obtaining the data set.
1.8 Use Constraints -- restrictions and legal prerequisites for using the data set after access is granted. These include any use constraints applied to assure the protection of privacy or intellectual property, and any special restrictions or limitations on using the data set.