Metadata and RDF Survey
Overview
- Metadata and metadata issues
- History of the Dublin Core
- The Resource Description Framework (RDF)
- Applications for Unidata community?
Purpose:
- technology update: distill essence out of lots of documents, web
pages
- learn new buzzwords: Dublin Core, Warwick Framework, Canberra
Qualifiers, RDF, minimalists vs. structuralists, reification, ...
- discussion of relevance for Unidata community
Metadata
- Assertions about data
- Example: headers in an email message are metadata referring to
message content
- Useful for
- describing and documenting (resource description)
- finding (resource discovery)
- extracting, combining, ...
- evaluating, rating
- understanding
- "Machine understandable information about web resources or other
things" -- Tim Berners-Lee in Metadata
Architecture
- metadata is data (so metadata can describe metadata)
- metadata can occur
- within the data
- separately from the data
- accompanying the data
- wrapping the data
- metadata is a set of independent assertions about a resource
- Becoming a meaningless buzzword?
What Data is Metadata About?
There can be confusion between
- description of an object
- description of digital surrogates for the object
For example, consider metadata associated with a web page about the
visualization of a model of a meteorological event. Is the metadata
(e.g. Date) about
- an event?
- modeling of an event?
- data representing an event model?
- visualization of data representing an event model?
- the web page containing a visualization of data representing
an event model?
- digital signature asserting authenticity of the web page ...?
- ...
The Dublin Core Initiative
Attempt to define a core set of metadata elements for resource discovery.
Process:
- 5 workshops
- consensus building
Result:
- 15 metadata elements for content, intellectual property, and
instantiation.
- A framework for extensions, collections of metadata.
- Some interesting debates, e.g. minimalists vs. structuralists
DC-1: Dublin
Goals:
- Simplicity (so creators can provide it)
- Semantic interoperability (useful across disciplines)
- International consensus (world-wide scope of resource discovery)
- Flexibility (power of expression, adaptability)
Emphases:
- resource discovery
- document-like objects (DLO's)
Ignored intellectual property, archival status, syntax, ...
DC-2: Warwick
- began exploring syntax issues
- developed an extensible framework, with realization that
metadata is far too diverse to fit into one useful taxonomy.
Each community needs to create, develop, maintain its
own metadata framework, but it needs to be interoperable.
Warwick Framework:
- one size won't fit all needs
- container-package architecture for aggregating metadata packages
- modular, to allow for differently typed metadata objects
- extensible, to allow for new metadata types
- distributed, to allow external metadata objects to be referenced
- recursive, to support metadata for metadata objects
DC-3: Dublin
- considered images, image collections
- added "Description" and "Rights" to make 15 elements
- changed names to be more generic
- discovered significant commonality between metadata for
document-like objects and images
DC-4: Canberra
- minimalist/structuralist rift
- addition of "Canberra Qualifiers"
Minimalists versus Structuralists
Spectrum of metadata provided for resource discovery
- none: (full text indexing) cheap, easy to create and maintain,
low utility, imprecise
- unfielded surrogates: (keywords)
- minimally-fielded surrogates: (tags with values)
- qualified surrogates: (hierarchical tags with values)
- richly-structured surrogates: (arbitrarily complex
metadata objects) expensive, hard to create
and maintain, high utility, precise
Minimalists: simplicity and interoperability most important.
Structuralists: adaptability and precision most important.
The Canberra Qualifiers
Suggested by structuralists for refining some of the 15 elements of the
Dublin Core.
- SCHEME, e.g. DeweyDecimalSystem or LibraryOfCongress for Subject value
- LANG, language for attribute values
- SUBELEMENT, e.g. Date.created, Date.acquired, Date.valid, ...
Especially necessary for
Dublin Core
Coverage element
DC-5: Helsinki
- Finnish Finish: refined definitions of 15 Dublin Core Elements
- subelements working group established
- began work with formal data model W3C, RDF
- began serious standardization efforts
Current Status
- pre-RDF:
Proposed Convention for Embedding Metadata in HTML (2.0 and
later)
- HTML 4.0 adds SCHEME and LANG tags to META element
- 5 drafts submitted to IETF
- NISO standardization discussions begun
- Working toward issuing a draft standard for Dublin Core without
qualifiers this year
A Dublin Core Example
Metadata for some decoded data (ruc.dc):
Title ="RUC Model Output"
Subject ="model output; mesoscale forecast model; geopotential height;
temperature; wind; pressure; relative humidity; precipitation;
mean sea level; surface; tropopause; maximum wind level;
boundary layer"
Description="3-hourly output of meteorological parameters from
numerical forecast model on a Lambert conformal grid"
Source ="NCEP RUC Model"
Relation ="isDerivedFrom ftp://nic.fb4.noaa.gov/pub/ruc/"
Coverage ="Continental US"
Creator ="gribtonc program, version 2.3"
Publisher ="Unidata Program Center"
Contributor="RUC model, NOAA Forecast Systems Laboratory"
Rights ="freely available"
Date ="1998-04-17 03:00 UTC"
Type ="data"
Format ="netCDF"
Identifier ="http://dods.unidata.ucar.edu/model/98041703_ruc.nc"
Language ="en"
The above uses no concrete syntax. Here's the same Dublin Core
metadata represented using
What is RDF?
Resource Description Framework
- infrastructure for Web metadata
- uniform and interoperable means to exchange
metadata between programs and across the Web
- means for publishing human-readable and machine-understandable
metadata property sets
- a domain-neutral knowledge representation mechanism
- W3C draft under development
- generalization of PICS
- an abstract data model and a concrete syntax implementation
- uses technology submissions of Microsoft (Web Collections),
Netscape (MCF), and others
Claimed Uses for RDF
- resource discovery
- cataloging resources
- intelligent software agents
- content rating
- creating and describing collections
- for "intellectual property" rights
- with digital signatures to create a "web of trust"
- with XML and DOM to create an object Web
The RDF Data Model
- models (attribute,value) pairs attached to resources
- labeled directed graphs
- [node] --- PropertyType --> value
- [node] --- PropertyType --> [node]
- can represent
- attributes of resources
- relationships between resources
RDF Syntax
- XML may be used as a graph serialization syntax
- Two XML syntaxes proposed, complete and abbreviated
- RDF editors will support creation, editing of RDF
- See Reggie, a metadata editor
for Dublin Core implemented as a Java applet
RDF Examples
Some Issues for Discussion
- Is RDF (still only a draft) a reasonable choice for Unidata's
metadata needs? Other standards include ISO 11179, MCIS,
FGDC, ISO 15046-15,
GILS, and
hundreds of others ...
- Who should organize the creation of a metadata framework for
atmospheric or geosciences resources, similar to IMS Metadata
Specification for educational resources?
- Can netCDF conventions for georeferencing benefit from the RDF data
model? Are they useful in the RDF coverage element?
- Should we ignore metadata issues until standards crystallize?
This document is maintained by
Russ Rew <russ@unidata.ucar.edu>
and was updated on
.