I'm in the odd position of agreeing in principal with several writers
(keep metadata with data, support non-networked computing, the values
are more than the numbers), and then disagreeing with many details. A
few examples are below.
On reading Steve Hankin's post, though, I must ask: What exactly is
being proposed? A binary data format for files? A set of such binary
data formats? Or a protocol for exchanging information? Is this
simply a recapture of 'everything netCDF and CF' so that OGC can put a
stamp of approval on it?
Ben wrote "This approach will result in a binary encoding which can be
used with different access protocols, e.g., WFS or SOS as well as
WCS." I don't really know what it means to 'use a binary encoding
with SOS', can we be more precise about that?
In short, having read through the referenced 'core standard' proposal
[1], I can't tell what we're trying to do yet..
Other comments on this thread, for those needing distraction:
On Aug 20, 2009, at 10:00 AM, Ron Lake wrote:
I would argue that we should stop this idea that data are just
numbers and strings and everything else is "metadata". <snip> Let's
start by defining the objects of interest and THEN we can have
metadata about them.
After watching thoughtful communities try to carefully describe 'the
object of interest', I am sure the proposed 'start' will be a long
slow one. I'd rather stick with "one person's data is another person's
metadata", and try to avoid getting too excited about the precise
distinction between data and metadata, except when it is very narrowly
defined on a specific project (not the case in this thread, IMHO).
On Aug 20, 2009, at 9:54 AM, Tom Whittaker wrote:
One of the single biggest mistakes that the meteorological community
made in defining a
distribution format for realtime, streaming data was BUFR -- because
the "tables" needed
to interpret the contents of the files are somewhere else....and
sometimes, end users cannot find them!
Perhaps this is a problem with the way the tables are made available,
and not simply the fact they are separate from the data stream? After
all, many image files (for example) are not described internally at
all, but no one seems to have trouble working with those images....
(I know that's oversimplifying the difference, but it's instructive
nonetheless.)
NetCDF and ncML maintain the essential metadata within the files:
types, units, coordinates -- and I strongly urge you (or whomever) not
to make the "BUFR mistake" again -- put the metadata into the files!
Maybe you think all the essential metadata is within the netCDF file,
but in my opinion it isn't. I often find the essential metadata,
particularly of the semantic variety, to be absent. And I know of
communities that have had significant difficulty with the provenance
(for example) within CF/netCDF files.
The generalization (point) of this observation is that different
people require different metadata, sometime arbitrarily complex or
peripheral metadata. And I don't think you want ALL that metadata in
the same file as the data -- especially when the data may be coming
not in a file, but in a stream of records.
Do not require the end user to have to have an internet connection
to simply "read" the data....
many people download the files and then take them along" when
traveling, for example.
Ah, in the era of linked data, or LinkedData [2] -- which will be our
era in 5 years from now, if not already -- this problem will be
solved, because all will insist on having the internet connection when
they are traveling. Witness the trajectory of internet availability at
scientific conferences.
If I simply downloaded the file at
<http://schemas.opengis.net/om/1.0.0/examples/weatherObservation.xml>
I would not be able to read it. In fact, it looks like even if I
also got the "metadata" file at:
<http://schemas.opengis.net/om/1.0.0/examples/weatherRecord1.xml>
I would still not be able to read it, since it also refers to other
servers in the universe to obtain essential metadata.
Uh... I think you may be a bit wrong about what you saw in the
examples. The first file is crudely readable if not comprehensively
described (to say the least), but by the designer's choice this file
references more detailed metadata in a second file. (The file creator
didn't have to do that per the spec, but in some observing systems I
would say it makes sense.) Nothing in the second file appears to
refer to 'essential metadata' in other files... depending on what you
think of as essential of course. (The .xsd for example is more of a
format specification, not a bit of central metadata. By analogy, I
can't find the reference in a netCDF file to any specification of its
format, so I guess it wouldn't qualify as containing all the essential
metadata in that sense either.)
John
[1] Core standard OGC draft:
http://sites.google.com/site/galeonteam/Home/cf-netcdf-candidate-standard
[2] Linked Data: linkeddata.org
--------------
NOTE NEW EMAIL ADDRESS
--------------
John Graybeal <mailto:jbgraybeal@xxxxxxxxxxxxxx>
Marine Metadata Interoperability Project: http://marinemetadata.org