[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: vocabularies



Hi Dan,

I was not sure whether there was a real distinction between an attribute of an element and a subelement of an element in xml, which is why I asked the question.

But on a deeper level .....,

THREDDS v1.0 has the wonderful innovation that any element can have a vocabulary attribute, which specifies the controlled vocabulary for the values of that element. I think this is the greatest thing. They also have this variable example, where the variables element specfies the controlled vocabulary for each variable within it -- this makes perfect sense, but the grammer is less than ideal.

Going back to the example,

<variables
xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"; >
<variable name="wv" vocabulary_name="Wind Speed" units="m/s"/>
<variable name="wdir" vocabulary_name="Wind Direction" units=
"degrees"/>
<variable name="o3c" vocabulary_name="Ozone Concentration" units="g/g"/>
...
</variables>


Conceptually, I would like to write this as


<variables

xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"; >

<variable name="wv">

<vocabulary_name vocabulary="CF-1.0">Wind Speed</vocabulary_name>

 <units vocabulary="udunits>m/s</units></variable>

<variable name="wdir">

 <vocabulary_name vocabulary="CF-1.0">Wind Direction</vocabulary_name>

  <units vocabulary="udunits>degrees</units></variable>


...

</variables>


So each element can specify its own controlled vocabulary, instead of being stuck formulating a convention that covers all the attributes in some vague way. Given a standard for transmitting vocabularies, I can now write software to use that information: validating element, displaying additional information (i.e. a controlled vocabulary as an indexed (keyed) table), allowing conversions (a units attribute can be used for units conversions, a projection attribute can be used for projection conversions). Conventions would then get constructed out of sets of "controlled vocabularies", the advantage being that the software can understand what a controlled vocabulary is, and written once, can then understand many conventions. Of course, "controlled vocabulary" needs to be broadened, particularly as sets of attributes can interact and are more complicated that simple lists.


Part of this I suppose is personal perspective: I think we would get a lot farther if we set up conventions a few attributes at a time. But there is a practical side too: it gives us a way of marking a dataset as obeying a convention with some exceptions. For example, CCM model output once it is in a netcdf file comes marked as following the CF conventions. CF conventions start by saying thou shall use udunits-compatible units. However, the CCM output I have encountered has very few units that udunits parses as it currently stands (mostly but not entirely a case problem). At least this way I could mark the units as not following the convention.

A more positive example is a variable that happens to contain ISO standard country codes. So the dataset can be marked up according to CF, plus I can specify that this particular variable's values have the given controlled vocabulary, making it a whole lot more useful.

Now obviously, in this example it would be better to specify the controlled vocabulary for both units and vocabulary_name at the variables level. I just wish we could use a grammer that was as general as allowing a vocabulary attribute for each element. Some sort of element-specific inheritance, I suppose.

<vocabularies inherit=true>
<controlby vocabulary="udunits"><attribute>units</attribute></controlby>
<controlby vocabulary="scalethenadd">
<attribute>scale_factor</attribute>
<attribute>add_offset</attribute>
</controlby>
<controlby vocabulary="applyscale">
<attribute>value_min</attribute>
<attribute>value_max</attribute>
<attribute>scale_min</attribute>
<attribute>scale_max</attribute>
<attribute>missing_value</attribute>
</controlby>
</vocabularies>



On the other hand, this might be going too far. Having attributes that splice together two conventions might be done in the specification of the vocabulary in THREDDS v1.0, since the convention for transmitting vocabularies is to be determined. I could then make up my own convention "almost CF', or "CF plus iso99999" and inherit all of CF plus whatever changes are needed. Then we are back to exactly what you said -- all we need to do is specify the proper convention for the whole set of attributes. Of course we might end up with "per-dataset" conventions, but since we can describe them in a standard way, perhaps it is not too bad.

The kicker is (and I am so glad that you asked), the kicker is that what I really want is to specify vocabularies for OpenDAP attributes (THREDDS giving the variable list is sneaking down into the OpenDAP level of specificity). So is that part of the next generation of OpenDAP?

Benno

The original conversation:

Benno Blumenthal wrote:



If I understood xml better, I guess I would know the answer to this
question, but here goes.

Suppose I had a variable list, e.g. (taken from the documentation page)

<variables
xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"; >
<variable name="wv" vocabulary_name="Wind Speed" units="m/s"/>
<variable name="wdir" vocabulary_name="Wind Direction" units=
"degrees"/>
<variable name="o3c" vocabulary_name="Ozone Concentration" units="g/g"/>
...
</variables>



Suppose I want to say that the units are udunits compliant.  Can I write

<variables
xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"; >
<variable name="wv" vocabulary_name="Wind Speed">
 <units vocabulary="udunits>m/s</units></variable>
<variable name="wdir" vocabulary_name="Wind Direction">
  <units vocabulary="udunits>degrees</units></variable>
<variable name="o3c" vocabulary_name="Ozone Concentration">
 <units vocabulary="udunits">g/g</units></variable>
...
</variables>

I certainly would like to be able to do so.




   Currently 'units' are an attribute of <variable> not a separate
element.  But doesn't
your example imply that you want to identify the 'authority', or
'controlled vocabulary'
that both the 'units' as well as variable 'name' are relative to?   The
schema allows for
the catalog to identify the source of the controlled vocabulary in use,
I assume that
could be extended to include the authority for the 'units' that are used
as attributes of
a <variable> element.   That might negate the necessity of adding
specific <units>
elements to the schema.   Just a thought.   I too am not an expert on
XML Schemas.

    Dan



Benno




-- **************************************************************************** < Unidata User Support UCAR Unidata Program < (303)497-8643 P.O. Box 3000 < address@hidden Boulder, CO 80307 < ---------------------------------------------------------------------------- < Unidata WWW Service http://my.unidata.ucar.edu/content/support < ---------------------------------------------------------------------------- < NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.

------- End of Forwarded Message





--
Dr. M. Benno Blumenthal          address@hidden
International Research Institute for climate prediction
The Earth Institute at Columbia University
Lamont Campus, Palisades NY 10964-8000   (845) 680-4450








NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.