Re: Proposed new specification for THREDDSS Catalogs

John,

John Caron wrote:

Roland Schweitzer wrote:

John Caron wrote:

Roland Schweitzer wrote:

John,

I have a question about the THREDDS Dataset Inventory Catalog XML. I don't intend this as a criticism, but rather I'm curious about the choices and trade-offs. All of us that are messing around with XML are wrestling with similar issues.

In general, it seems that relationships between elements in the XML are done via attributes. For example, a <service> element is referred to in the document via the serviceName attribute in the <dataset> element. And a <dataset> element can be repeated by referencing the name of another <dataset> element via the alias attribute.

It seems to me that using this technique then requires that client code must be written to follow these connections. By contrast, it seems that the XML community has attempted to create languages (like XPointer) that would "standardize" these sorts of references. Admittedly, even though the XPointer recommendation is a year old, I have not found (m)any implementations in general purpose XML software.

Can you please comment on these choices and trade-offs for defining the internal connections between bit of XML that went into developing the Inventory Catalog?

Thanks,
Roland

Hi Roland:

<excuse> Sorry its taken me so long to answer this </excuse>

Anyway, its not clear that the XPointer spec will become an official standard. XPath seems useable though, and i am open to it. Both the serviceName and the alias = dataset ID are more or less the simple case of XPath using IDs. I think using IDs for datasets is so useful that it should probably be required. Which I would do if we could do so and still allow the minimal datasets like the DODS File Server. This ID reference is so simple that even DTDs have it.

So Id say full XPath is a bit of overkill right now, but i am open to using it in the future. Do you forsee any new features that might need it?




No excuses needed and no worries.

I don't have any particular features in mind that require full XPath, but my question was directed at the idea that we should get the most bang for the buck that we can out of the validation of documents. In the new catalog schema, every attribute (except name) is optional on the dataset element. This means, simple catalogs are possible. But, I think it also means that there is no way from simply validating the XML to guarantee that the alias references are available in the document. This is a valid document (according to the schema and XML Spy):

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="blah blah blah">
   <dataset name="billy" ID="b1"/>
   <dataset name="pointer to nothing" alias="sam"/>
</catalog>

even though the dataset named "pointer to nothing" does just that.




I'll be the first to admit I'm not even sure if what I'm thinking about is possible, but I think if there were some way to use the "standard" constructs of XML to enforce the relationship between dataset elements with alias attributes and the dataset elements to which they refer it would somehow be "better". I assume when you "validate" a document with your client library you enforce this relationship, but it seems it might be "better" if an off the shelf validation code (like XML Spy) could enforce this relationship. As I said, I don't know if it is possible and I'm trying to figure this out for XML I'm designing so I'm hoping to benefit from our discussion and your experience designing these catalogs.

Thanks,
Roland


i agree with you on all this; we continue to try to use standard validation as much as possible.

on this particular example, we actually now can validate this, (with the latest version of the schema put out about a week ago and cleverly not announced to anyone yet ;^) at

 http://www.unidata.ucar.edu/schemas/thredds/InvCatalog.1.0.xsd

the way it works is using the "keyref" constraint:

<!--
Enforce dataset ID references:
        1) Each dataset ID must be unique in the document.
2) Each dataset alias must reference a dataset ID in the document. -->
- <xsd:unique name="datasetID">
 <xsd:selector xpath=".//dataset" />
 <xsd:field xpath="@ID" />
 </xsd:unique>

- <xsd:keyref name="datasetAlias" refer="datasetID">
 <xsd:selector xpath=".//dataset" />
 <xsd:field xpath="@alias" />
 </xsd:keyref>

interestingly enough, it appears that Xerces is not yet handling this constraint, but XMLSpy seems to. I havent yet tracked this down, or found out if i need a more current version of Xerces. (i didnt get a chance to try this on your example, let me know if you do...)

I tried XML Spy on my little example and indeed it was found to be invalid under the new schema. Cool!

IMO, schemas are still bleeding-edge; im hoping they get more mature soon. theres a lot of sentiment against W3C Schema; i toyed with Relax-NG as an alternative. Just have to keep trying different stuff for now....

I understand. I too have been considering Relax NG because it's "easier" to specify ideas like an element should have either this set of attributes or this other set of attributes, but not both sets of attributes. However, nothing is obvious.

Thanks,
Roland



  • 2004 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: