To make sure I understand you, I am going to add some annotation. let me
know if any of it is wrong.
Also, I am going to use the following definitions for now:
Dataset: the user can select and get a URL.
Collection: group of Datasets.
I will capitalize them to distinguish them from more general useage.
----- Original Message -----
Cc: <thredds@xxxxxxxxxxxxxxxx>
Sent: Wednesday, June 05, 2002 2:56 PM
> I think the word dataset is causing trouble. There are at least three
> potential meanings for this word in the context of THREDDS:
>
> 1) an entity that is considered as a unit by human beings
Part of a human mental model/ontology.
>
> 2) an entity that can be operated on as a unit by the THREDDS API
An XML InvCatalog element, and compositions of such.
>
> 3) an entity that can be operated on as a unit by a data access protocol
A software object accessed/returned/manipulated from the protocol-dependent
handler.
>
> Right now, only the entities described by "access" tags meet all of 1,
> 2, and 3.
>
> The tags "dataset" and "collection" both describe entities that only
> meet 1 and 2.
I wonder if my annotations are incorrect since I may not understand this. If
I do have them correct, then I would say:
Currently a Dataset XML element is supposed to meet 1, 2, and with help from
an access element, 3. A Collection XML element meets 1, and 2, and the
question is should we find a way to let it also map to 3) when appropriate.
In the case where it is appropriate, ie a Collection has a URL, then its
easy to take it one step further and just erase the distinction between a
Collection and a Dataset. However there are 2 concerns to this approach:
1) When a Collection doesnt have a URL, it cannot meet definition 3). So now
you dont have a word for something that always meets 1, 2, 3.
2) What is the relationship between the contents of a Collection element and
the contents of the Collection's URL? If the relationship is not
particularly well defined or meaningful, you might as well just encode the
Collection's URL as a Dataset. If theres a clear and useful relationship
then it could be a good idea to give the Collection an access element which
makes it clear that that URL has the defined relationship with the rest of
the contents.
> Thus I agree with benno that there is not a very
> meaningful distinction between them (and reconsider my listing of them
> as orthogonal concepts in my previous message).
>
> I wonder if it would be a good idea to merge these concepts and use a
> less loaded word, say "entry", to refer to an entity that has meaning to
> THREDDS and to end users, but not to a data access protocol, i.e.
>
> <catalog>
> <service name="X"/>
> <service name="Y"/>
> ...
>
> <entry name="my_dataset">
>
> <metadata name="global-metadata" url="..."/>
> <access name="global-X-access"/>
>
> <entry name="monthly-data">
> <metadata name="monthly-metadata" url="..."/>
> <access name="X-with-COARDS" serviceType="X" url="..."/>
> <access name="X-with-no-COARDS" serviceType="X" url="..."/>
> <access name="X-flattened-to-2D" serviceType="X" url="http://..."/>
> <access name="Y" serviceType="Y" url="..."/>
> ....
> </entry>
>
>
> </entry>
Ok so an "entry" meets meaning 1), while an "access" meets meaning 3) (we
dont need to worry about meaning 2) here).
Some questions:
1) Should we understand that all the access elements within an entry are
different versions of the same dataset? Should we disallow:
<entry name="monthly-data">
<metadata name="monthly-metadata" url="..."/>
<access name="monthly-data from MARS" serviceType="X" url="..."/>
<access name="monthly-data from VENUS" serviceType="X" url="..."/>
</entry>
2) is there any relationship between peer elements, in your example
<access name="global-X-access"/>
<entry name="monthly-data">