[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Fwd: Re: Proposed new specification for THREDDSS Catalogs]





-------- Original Message --------
Subject: Re: Proposed new specification for THREDDSS Catalogs
Date: Mon, 05 Apr 2004 10:36:30 -0600
From: John Caron <address@hidden>
Organization: UCAR/Unidata
To: Jeff McWhirter <address@hidden>
CC: address@hidden, Ethan Davis <address@hidden>
References: <address@hidden> <address@hidden>



Jeff McWhirter wrote:

John Caron wrote:

A proposed new version of the THREDDS Dataset Inventory Catalog is ready for your comments. Please send them to address@hidden, or to me.




John,
Here are some comments about the catalog specification.

First of all it would be great if there was a full blown example catalog that
shows all of the different pieces of the specification in one place.
(Or am I just missing it?) I'd really like to see some examples of how
the metadata, coherent tags, variables, vocabulary, etc., all fit together.

yes, ill get a decent example out this week.




Under the changes document you have:
access
 remove serviceType (no anonymous service)

What does the "no anonymous service" mean.

it used to be you could define a service by adding a serviceType attribute to an access element.
we are withdrawing that feature to make things simpler



I 'm a bit confused about how to use an alias. An example would help.

ill add an example




You say:
"For more complicated situations, use nested access elements."
What is the difference between the nested access element and
simply having the serviceName, etc., right in the data set. When
and why would I choose one or the other approach.
Can you have multiple contained access elements?

use explicit access elements when there is more than one way to access the dataset. ive tried to rewrite that section to be clearer:

"The serviceName and urlPath attributes on the dataset element are used for the common case that a dataset has a single access. The serviceName refers to the unique name of a service element. The urlPath is appended to the service's base to get the dataset URL. (see constructing URLs). Logically the use of these two attributes creates an access element for this dataset. When you have more than one way to access a dataset, explicitly define them using more than one nested access elements. "




Maybe I missed it but I assume the serviceName of "this" implies that
is is relative to the url where we got the catalog from?

formally there is no semantics to naming a service "this". in the case that a catalog is written to describe the datasets from a particular data server, we use the idiom of naming that service "this". For the aggServer, we have told people to make it a reletive URL, because of various reasons about the aggserver implementation.





metadata: Your example shows a metadata tag pointing to an ncml file. You also have ncml as a data format type. Why would you use the ncml as metadata?

thanks for catching that. the data portal people are using Ncml in a way that i wouldnt, although its legal from a catalog POV. i will change the example to avoid confusion.

as an aside, a recent conversation with the ESML group reveals that we probably would point to ESML as metadata, so that the data URL can point to the actual data file.




Can you give an example of how a client would use the variables tag.

The main purpose of <variables> is for digital libraries, in particular we need it for GCMD, who requires a list of available "parameters" from their controlled vocabulary. A client might want to show those "alternative names" to the users. Perhaps we whould automatically add them to the netcdf data model so it can be done in a standard way?



Would you have a variables tag in a composite data set.

I assume you mean collection dataset?

Yes, it would make the most sense if the collection was a group of datasets with the same variables (eg a time series), and so youd put an inherit=true tag on it to convey that info.

even if that wasnt the case, it may still make sense as a high-level description of a dataset for a digital library.



Can you give an example of how a client would use the vocabulary?

hmmm, again its main point if for DL, but in some cases it might be helpful for the user to know what vocabulary was being used. If you had more than one vocabulary (which i think will happen) the use might want to select which s/he prefers.



What are the semantics behind the data types? e.g., what does Grid mean?
Or Station? Would a shapefile be classified as a Trajectory?

Yeah baby! Now those are the good questions! ;^)

Currently im thinking of letting people use the vocabularies they are used to (eg Grid, Swath, Point for HDF-EOS), then clarify their mappings into a "common data model" and visad. I have some vague notions what that means, so i dont think weve made the task impossible, but theres a lot of work to be done. im hoping this is one of the main foci for THREDDS/IDV collaboration.

Shapefiles are probably a degenerate Trajectory (because shapefile has no no time dimension); id probably use "Feature" or something.





I like the "coherent" data set attribute. As we talked earlier perhaps there can be a further elaboration on this that describes whether the sub-datasets of a coherent data set can be views/accessed individually or should a UI just show the parent.

im thinking that the UI would allow a user to select:
   1) a direct dataset
   2) a coherent dataset parent
3) a sub-collection of a coherent dataset (which would also be a coherent dataset).

The collectionType attribute says that this dataset is coherent, and adds enough semantics (time series, station collection, what else?) that the client knows how to deal with the collection.

So this design puts the decision in the hands of the user if they want to view the entire collection or individual elements of it. Seems like both would be reasonable, not sure of a use case where it wouldnt.



The coherent flag addresses some of the issues the IDV has had about when to treat a collection of dataset urls as a whole.

yes, they are pretty much a direct resonse to your and Dons ideas (just because were slow it doesnt mean we arent listening ;^)

thanks for your input! do you mind if i forward this to thredds email group?