[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

catalog URLs



----- Original Message -----
From: "Benno Blumenthal" <address@hidden>
To: "Joe Wielgosz" <address@hidden>
Cc: "John Caron" <address@hidden>; <address@hidden>
Sent: Thursday, May 16, 2002 10:11 AM
Subject: Re: New Catalog XML Draft


> Hi Joe,
>
> Your point is well taken:  there is a simplicity and explicitness in only
> allowing full urls.   Downside is a bit mixed, I guess.
>
> 1)  The servicelist concept becomes useless: one has to explicitly list
all the
> services for each dataset, which could either make thredds documents much
larger
> or discourage servers from advertising a large range of services.
>
> 2) clients already know how to construct urls (a DODS client constructs a
> DODS url, a THREDDS client constructs a query for a catalog server):  the
level
> of complexity added by giving some ability to construct the rest of the
url may
> not be too bad.
>
> 3) While different providers have different ways of mapping datasets,    I
think
> the combination of baseurls, subpaths, and suffixes covers a lot of ground
> (cgi-parameters fall into the suffix catagory, for example).

I think I agree with you, that adding a "suffix" attribute seems useful in a
way more general than just DODS. (but I need to think it through some more)


>
> Benno
>
>
> Joe Wielgosz wrote:
>
> > John, Benno,
> >
> > A high-level question related to the base URL/suffix discussion:
> >
> > How much effort should THREDDS put into supporting particular data
> > provider site layouts?
> >
> > I initially thought base URLs and relative paths were a nice feature
> > (they would certainly be convenient for GDS catalogs) but upon
> > reflection, I wonder if they aren't more trouble than they are worth.
> >
> > If you look at different data providers websites, they all have their
> > own way of mapping datasets, collections and access methods to URLS, and
> > some are quite complex. They might use any combination of different base
> > URLs, subpaths or suffixes, CGI parameters, or who knows what else.
> >
> > You could try to support some, or even all, of these mappings in the
> > catalog DTD---but it increases the complexity of the DTD and the THREDDS
> > client API, and I am not sure what benefit THREDDS users get in return,
> > other than the catalog files being a little more concise.
> >
> > On the other hand, I see a definite client benefit to leaving out
> > relative URLs and other mapping schemes - if URLs are always absolute, a
> > <dataset> tag and its <access> sub-tags can be interpreted
> > unambiguously, without having to poke through the rest of the catalog
> > file. IMHO, this would make the catalog format much more robust and make
> > client development easier.

Factoring out the base part of the URL into the server element requires more
XML processing, and is perhaps not worth the effort. But theres nothing
ambiguous about it. There has to be a standard way for each service to
construct its URLs from the info in the catalog. Its our job to agree on
that standard way.

I admit I have mostly thought about making life easy for catalog writers and
for clients that use the Java library. In the java library since I can do
the "poking around" once, and others wont have to. I realize if you have to
do it yourself, simplicity is good.

BTW, as sort of background, "the URL is not really a valid URL, it is
actually a string from which a valid URL can be constructed. The details of
that construction are specific to the server type (DODS, ADDE, NetCDF)" see
note appended


> >
> > And I see a couple ways that the information represented in base URLs
> > could be catalogued more usefully:
> > - If the provider wants to give a URL for information about a collection
> > or service, it should be in a <documentation> tag.
> > - If they want to provide direct data access to an entire collection as
> > a single data object, it should be in an <access> tag.

Benno and I and others were kicking this idea around, how to let a
collection also be a dataset. So your idea is that if a collection has an
access element, then it is a dataset? Hmmm, might work, what do you think,
Benno?

----------------------------------------------------------------------------
----------------
Heres some notes from the Catalog doc
(http://www.unidata.ucar.edu/projects/THREDDS/tech/InvCatalog5.html )

"Now to be more precise: The URL is not really a valid URL, it is actually a
string from which a valid URL can be constructed. The details of that
construction are specific to the server type (DODS, ADDE, NetCDF). Examples:

  a.. DODS: A DODS dataset with URL
"http://thredds.unidata.ucar.edu:8080/dodsC/air.1948.nc"; won't get you
anything useful. The client has to add a DODS-specific suffix like "dds",
"das" or "dods" for example
"http://thredds.unidata.ucar.edu:8080/dodsC/air.1948.nc.dds"; .
  b.. ADDE: An ADDE dataset with a server base "adde://adde.ucar.edu/" and
path "group=GINIEAST&descr=GE4KIR" also won't get you anything useful. A
client has to insert an ADDE-specific directive, for example:
"adde://adde.ucar.edu/imagedir?group=GINIEAST&descr=GE4KIR" will give a list
of data available for the named group and description. Datasets in ADDE
catalogs typically refer to a Dataset Description XML document which has
additional information that can be used to get specific datasets at specific
times. For example, using information in the Dataset Descriptor, the
following valid URL can be constructed, which refers to a specific satellite
image on an ADDE server:
"adde://adde.ucar.edu/imagedata?group=GINIEAST&descr=GE4KIR&size=1280
1280&day=2002-03-07&time=13:31:00&band=4". Note that the ADDE command in
this case is "imagedata".
  c.. NetCDF: A dataset with server type NetCDF implies that the dataset can
be accessed through the Java Netcdf (version2) API. For example, a locally
accessible netcdf file might have a URL of "file:///C:/mydata/ocean.nc", and
a netcdf file accessible through a web server might have a URL of
"http://unidata.ucar.edu/projects/THREDDS/examples/mydata/ocean.nc";. These
URLs can be passed directly to the ucar.nc2.NetcdfFile(java.net.URL url)
constructor, without modification. Note that a dataset with a local file URL
will not be accessible remotely.
  d.. Catalog: A dataset with server type Catalog implies that the client
application will get back another Inventory Dataset XML document. For
example, a dataset URL might be "http://thredds.unidata.ucar.edu/radar.xml";
and additional information in the Dataset Description XML document  allows
the client to construct the request
"http://thredds.unidata.ucar.edu/radar.xml?stn=PRX,fld=VORT,time=last3hours";
. This URL is sent to the server, which returns a catalog containing the
actual URLs of the requested data.





NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.