Re: [galeon] [WCS-2.0.swg] CF-netCDF standards initiatives

NOTE: The galeon mailing list is no longer active. The list archives are made available for historical reasons.

Howdy, John,

John Graybeal wrote:
I'm in the odd position of agreeing in principal with several writers (keep metadata with data, support non-networked computing, the values are more than the numbers), and then disagreeing with many details. A few examples are below.

Yeah, in a lot of ways, so am I.

On reading Steve Hankin's post, though, I must ask: What exactly is being proposed? A binary data format for files? A set of such binary data formats? Or a protocol for exchanging information? Is this simply a recapture of 'everything netCDF and CF' so that OGC can put a stamp of approval on it?

In a way, yes, this is what is proposed. THEN there's a formal way to add, extend and improve CF-NetCDF within a known framework.

Ben wrote "This approach will result in a binary encoding which can be used with different access protocols, e.g., WFS or SOS as well as WCS." I don't really know what it means to 'use a binary encoding with SOS', can we be more precise about that?

SOS can reference a binary ("netCDF") file and send it along. At that point, the XML metadata could (or maybe not) be reduced because of the self-documenting nature of well-constructed netCDF files. I'm less sure that it's a good fit for WFS, but I might be convinced.

In short, having read through the referenced 'core standard' proposal [1], I can't tell what we're trying to do yet..

Other comments on this thread, for those needing distraction:

On Aug 20, 2009, at 10:00 AM, Ron Lake wrote:

I would argue that we should stop this idea that data are just numbers and strings and everything else is "metadata". <snip> Let's start by defining the objects of interest and THEN we can have metadata about them.

After watching thoughtful communities try to carefully describe 'the object of interest', I am sure the proposed 'start' will be a long slow one. I'd rather stick with "one person's data is another person's metadata", and try to avoid getting too excited about the precise distinction between data and metadata, except when it is very narrowly defined on a specific project (not the case in this thread, IMHO).

This is a key point. A lot of otherwise really sharp folks tend to define everyone's data and metadata by their own prejudices, including me. After all, MY data's easy to identify and define, and I can see how YOUR data should be identified and defined, too. What? you don't agree with me? How dare you?

On Aug 20, 2009, at 9:54 AM, Tom Whittaker wrote:

One of the single biggest mistakes that the meteorological community made in defining a distribution format for realtime, streaming data was BUFR -- because the "tables" needed to interpret the contents of the files are somewhere else....and sometimes, end users cannot find them!

Perhaps this is a problem with the way the tables are made available, and not simply the fact they are separate from the data stream? After all, many image files (for example) are not described internally at all, but no one seems to have trouble working with those images.... (I know that's oversimplifying the difference, but it's instructive nonetheless.)

Ah, but it's not quite the same AND oversimplifying the difference. As well, using the current, well-known image files, there usually IS metadata (or something describing the image) somewhere in the header. That's just not the case with BUFR. You've some expectation of finding the GIF header in a file you think is a GIF. That tells you how the thing's compressed, what the core color table is, and the sampling. Then the data are relatively easy to pick out. For BUFR, you're required to have prior knowledge of the file to interpret it.

NetCDF and ncML maintain the essential metadata within the files:
types, units, coordinates -- and I strongly urge you (or whomever) not
to make the  "BUFR mistake" again -- put the metadata into the files!

Maybe you think all the essential metadata is within the netCDF file, but in my opinion it isn't. I often find the essential metadata, particularly of the semantic variety, to be absent. And I know of communities that have had significant difficulty with the provenance (for example) within CF/netCDF files.

Yeah, but... the mechanisms are there to put the semantic content into the netCDF file, and to display at least originator history. There's no guarantee someone might not change the internal metadata, but I don't think that's what you're asking about.

The generalization (point) of this observation is that different people require different metadata, sometime arbitrarily complex or peripheral metadata. And I don't think you want ALL that metadata in the same file as the data -- especially when the data may be coming not in a file, but in a stream of records.

Another good point. I often think along the lines of inheritable and file-unique metadata, and of how to obtain the inheritable stuff. There's little reason to include it when it could be obtained with a URI reference, but most disciplines can identify what their own file-unique (or observation-unique, experiment-unique, or such) metadata are, and those *should* be included.

Do not require the end user to have to have an internet connection to simply "read" the data.... many people download the files and then take them along" when traveling, for example.

Ah, in the era of linked data, or LinkedData [2] -- which will be our era in 5 years from now, if not already -- this problem will be solved, because all will insist on having the internet connection when they are traveling. Witness the trajectory of internet availability at scientific conferences.

If I simply downloaded the file at
<http://schemas.opengis.net/om/1.0.0/examples/weatherObservation.xml>
I would not be able to read it. In fact, it looks like even if I also got the "metadata" file at:
<http://schemas.opengis.net/om/1.0.0/examples/weatherRecord1.xml>
I would still not be able to read it, since it also refers to other servers in the universe to obtain essential metadata.

Uh... I think you may be a bit wrong about what you saw in the examples. The first file is crudely readable if not comprehensively described (to say the least), but by the designer's choice this file references more detailed metadata in a second file. (The file creator didn't have to do that per the spec, but in some observing systems I would say it makes sense.) Nothing in the second file appears to refer to 'essential metadata' in other files... depending on what you think of as essential of course. (The .xsd for example is more of a format specification, not a bit of central metadata. By analogy, I can't find the reference in a netCDF file to any specification of its format, so I guess it wouldn't qualify as containing all the essential metadata in that sense either.)

Ah, but isn't that some of what we're trying to achieve here? Some standard of the minimum required metadata to describe a dataset? I do honestly believe that's not going to be a single, all-inclusive definition, but more likely will end up as a discipline-by-discipline effort, but I do believe there's potential for creating a starting point to accomplish something here.

gerry
[1] Core standard OGC draft: http://sites.google.com/site/galeonteam/Home/cf-netcdf-candidate-standard
[2] Linked Data: linkeddata.org



--------------
NOTE NEW EMAIL ADDRESS
--------------
John Graybeal   <mailto:jbgraybeal@xxxxxxxxxxxxxx>
Marine Metadata Interoperability Project: http://marinemetadata.org

_______________________________________________
galeon mailing list
galeon@xxxxxxxxxxxxxxxx
For list information, to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/

--
Gerry Creager -- gerry.creager@xxxxxxxx
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843



  • 2009 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the galeon archives: