THREDDS API Question

Nathan Potter ndp at opendap.org
Mon Jun 11 17:47:20 MDT 2007



Ethan,


I've been thinking about it too. It is probably the case that this is  
an intractable issue. The THREDDS architecture benefits from the  
ability to write catalogs independent of the data source/archive.  
THREDDS is designed so that I can write a catalog and serve it on my  
system, while the data sources referenced in the catalog may exist on  
different systems - often far away and outside my sphere of influence.

The idea that the Access Link (the link that THREDDS generates that  
allows data access) can be used to back track into the THREDDS  
catalog is cross purposes to the THREDDS design. It would be possible  
for a particular server implementation to have very carefully  
constructed THREDDS catalogs that would allow it to do this, but a  
general case is simply not possible.

At least as far as I can see it.

Nathan



On Jun 11, 2007, at 4:09 PM, Ethan Davis wrote:

> Hi Nathan,
>
> Sorry this is taking awhile. I'm trying to figure out some of the  
> trade offs and such involved in a variety of ways of handling this.  
> I should have a more detailed response tomorrow.
>
> Ethan
>
>
> Nathan Potter wrote:
>>
>> Ethan et al.,
>>
>> After talking with Ethan on the phone today I think I can state  
>> the issue more clearly:
>>
>> The current THREDDS Servlet Framework (TSF) does not allow the  
>> collection/dataset information to be retrieved via the request URL.
>>
>> The API method DataRootHandler.getCatalog(java.lang.String path,  
>> java.net.URI baseURI) expects the "path" parameter to be the path  
>> in the THREDDS catalog to the catalog file. There is no  
>> restriction on the file name of the catalog file. The path in the  
>> THREDDS catalog to the file may be different that the access URL.
>>
>> What this means is that when a servlet receives an access request,  
>> even one that comes from a valid access link in a THREDDS catalog 
>> (.html), the servlet only knows about the request URL, nothing  
>> more. If the servlet needs to get the THREDDS dataset/collection  
>> information (and associated metadata if any) then it has no  
>> recourse but to attempt to search the catalog from the highest  
>> level looking for a dataset with a matching "urlPath" attribute.  
>> This activity may fail if:
>>
>> - The THREDDS catalog employs <catalogRef> elements.
>>
>> - The "urlPath" is not unique within the catalog.
>>
>>
>> I think that the TSF API should be augmented with accessor methods  
>> that allow the DataRootHandler to return InvDataset an InvCatalog  
>> to be retrieved based on information that a servlet has access to  
>> at run time, i.e. data that can be retrieved from the  
>> HttpServletRequest object.
>>
>>
>>
>> Nathan
>>
>>
>>
>>
>>
>> On Jun 4, 2007, at 5:00 PM, Nathan Potter wrote:
>>
>>>
>>> On Jun 4, 2007, at 1:05 PM, Ethan Davis wrote:
>>>
>>>> Hi Nathan,
>>>>
>>>> Can you explain the context for these questions. This is on the  
>>>> server side (in Hyrax)?
>>>
>>>
>>> Yes, server side.
>>>
>>>
>>>>
>>>> Nathan Potter wrote:
>>>>> Greetings,
>>>>>
>>>>> So I am using the THREDDS API in an attempt to get the  
>>>>> <property> elements for a dataset. I've run into a couple of  
>>>>> (possibly related) problems.
>>>>
>>>> Just to clarify our terminology. When you say "THREDDS API" you  
>>>> mean both the thredds.catalog and thredds.servlet packages? I  
>>>> generally split those apart and call the thredds.catalog package  
>>>> the "THREDDS Catalog API" and call the thredds.servlet package  
>>>> the "THREDDS Servlet Framework" (TSF).
>>>>
>>>> [Note: the TSF is probably only useful for those writing servers.]
>>>
>>>
>>> I wasn't distinguishing. But since DataRootHandler is in the TSF  
>>> then that is where I am suggesting an API change.
>>>
>>>
>>>
>>>
>>>>
>>>>> ** 1) I can't get the dataset information without searching.
>>>>>
>>>>> In the HttpServletRequest I have the URL for the dataset, say:
>>>>>
>>>>> http://localhost:8080/opendap/wcs/MODIS/Grid/test.hdf.html
>>>>
>>>> Is this URL for an OPeNDAP HTML response?
>>>
>>>
>>> Right, but the requested response isn't really meaningful in this  
>>> discussion since all I am really after is the THREDDS dataset  
>>> information for the atom/leaf/dataset test.hdf
>>>
>>>
>>>>
>>>> Are you trying to get the property from the THREDDS catalog so  
>>>> you can use it in the OPeNDAP response?
>>>
>>> Well... In truth it's much more complex than that, but since I  
>>> will have to do that too we can roll with that vision for the  
>>> moment.
>>>
>>>
>>>
>>>>
>>>>> In order for me to get THREDDS to divulge the <property>  
>>>>> elements for the dataset I have to:
>>>>>
>>>>> - take the dataset name "wcs/MODIS/Grid/test.hdf.html" and back  
>>>>> track to the
>>>>>   collection name, "wcs/MODIS/Grid/".
>>>>> - ask the DataRootHandler for the InvCatalog for "wcs/MODIS/Grid/"
>>>>> - Ask the InvCatalog for the InvDataset for "wcs/MODIS/Grid/"
>>>>> - Search the child datasets of the "wcs/MODIS/Grid/" InvDataset  
>>>>> for the
>>>>>   one whose name (lexically) matches "wcs/MODIS/Grid/test.hdf.set"
>>>>> - Read the properties of that InvDataset
>>>>>
>>>>> That seems awfully complex. (Of course there may a more  
>>>>> straight forward way that I am not aware of.)
>>>>
>>>> That is about as simple as it gets. Though I would suggest you  
>>>> make sure the THREDDS configuration (TSF) knows about this  
>>>> dataset first by getting the CrawlableDataset that matches the  
>>>> dataset URL:
>>>>       DataRootHandler.getCrawlableDataset("wcs/MODIS/Grid/ 
>>>> test.hdf")
>>>>       // I dropped of the trailing ".html" assuming it was the  
>>>> OPeNDAP dataset URL extension
>>>
>>>
>>> When I tried this I could only get CrawlableDataset objects for  
>>> catalogs that were part of a <datasetScan>
>>>
>>>
>>>
>>>>
>>>> Are you using InvDataset.findDatasetByName( String name) to find  
>>>> the child dataset?
>>>
>>> No.
>>>
>>>>
>>>> Also, depending on how you setup your dataset IDs, you could ask  
>>>> the catalog to find the dataset by ID, like
>>>>
>>>>       cat.findDatasetByID( "wcs/MODIS/Grid/test.hdf")
>>>
>>> Ahhh... I just tried that and it works. So, that greatly  
>>> simplifies that step, thanks!
>>>
>>>
>>>
>>>>
>>>>
>>>>> ** 2) When I ask for a catalog I have to know the name of the  
>>>>> XML file in which it resides.
>>>>>
>>>>> In the above example, when I ask the DataRootHandler for the  
>>>>> InvCatalog I ask for: " wcs/MODIS/Grid/catalog.xml" Which is  
>>>>> all well and good if all of the catalogs are stored in files  
>>>>> called catalog.xml. Essentially this means that anyone  
>>>>> configuring a THREDDS catalog has to create a hierarchy of  
>>>>> directories that mimics the organizatiopn of the collections,  
>>>>> and all of the THREDDS information must be stored in files  
>>>>> called "catalog.xml".
>>>>
>>>> Why do you need to create this hierarchy of directories  
>>>> mimicking the data collection hierarchy? The TSF should keep  
>>>> track of your config catalogs and the automatically generated  
>>>> catalogs.
>>>
>>> Right, but if all of the THREDDS catalog files have the name  
>>> "catalog.xml" they can't all be in the same directory, so they  
>>> have to live in some kind of directory hierarchy - I just figured  
>>> it made sense to mimic the collection organization, but that's  
>>> not necessary.
>>>
>>>
>>>
>>>>
>>>>> THREDDS does not actually require this - I can make a complex  
>>>>> hierarchy of collections by using either a single (complex) top  
>>>>> level catalog.xml file, or a collection of XML files in a  
>>>>> single directory that employ <catalogRef> elements to create  
>>>>> their organizations.
>>>>> However the API breaks down in both cases.
>>>>>
>>>>> If the catalog is composed of a collection of XML files in a  
>>>>> single directory that employ <catalogRef> elements to create  
>>>>> their organizations, then in order to retrieve catalog  
>>>>> information I would have to KNOW how the information was  
>>>>> organized (file names, directory hierarchy , etc.) But I don't  
>>>>> know - since the catalog may be created by a user after compile  
>>>>> time (although THREDDS does know this since it parsed all of  
>>>>> the catalog information at start up) - and I shouldn't have to  
>>>>> know. For me to know would require that I parse the top level  
>>>>> catalog.xml file and build the XML doc tree myself. At which  
>>>>> point it I can get the elusive <property> elements from the XML  
>>>>> doc in memory.
>>>>>
>>>>> If the catalog is composed of a single (complex) top level  
>>>>> catalog.xml file then I would have to know that and just ask  
>>>>> for the top level catalog.
>>>>>
>>>>> (Searching the entire catalog from the top down for my dataset  
>>>>> doesn't seem to work either...)
>>>>
>>>> I'm sorry, I'm having a hard time following here. What are you  
>>>> trying to do and why?
>>>
>>> For any request that is looking for one of the OPeNDAP data  
>>> responses I need to search the THREDDS catalog for the dataset,  
>>> and if found, I need to extract any metadata that may in the  
>>> catalog for that dataset.
>>>
>>>
>>>>
>>>> Is the problem that you may not know if the dataset is contained  
>>>> in a catalog generated because of a datasetScan element or  
>>>> contained directly in one of the THREDDS config catalogs?
>>>
>>> I think that's a separate issue.
>>>
>>>
>>>>
>>>>> All of these methods of writing and organizing catalogs are  
>>>>> legitimate in THREDDS, and users writing THREDDS catalogs would  
>>>>> likely employ one or more of these methods when writing their  
>>>>> catalogs.
>>>>>
>>>>>
>>>>> I propose that the THREDDS API be extended so that one can  
>>>>> simply ask the DataRootHandler for an InvDataset or an  
>>>>> InvCatalog. Like:
>>>>>
>>>>>     InvDataset id = drh.getDataSet("wcs/MODIS/foo.nc");
>>>>>     InvCatalog id = drh.getCatalog("wcs/MODIS/");
>>>>>
>>>>> or possible the InvDataset that represents a collection:
>>>>>
>>>>>     InvDataset id = drh.getDataSet("wcs/MODIS/");
>>>>>
>>>>>
>>>>> If the DataRootHandler doesn't have it, return null.
>>>>>
>>>>>
>>>>> Is that unreasonable?
>>>>
>>>> I'll have to take a closer look at this.
>>>>
>>>> Ethan
>>>>
>>>>>
>>>>> Nathan
>>>>>
>>>>>
>>>>> = 
>>>>> Nathan Potter                        ndp at opendap.org
>>>>> OPeNDAP, Inc.                        541.752.1852
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ================================================================== 
>>>>> ============
>>>>> To unsubscribe thredds, visit:
>>>>> http://www.unidata.ucar.edu/mailing-list-delete-form.html
>>>>> ================================================================== 
>>>>> ============
>>>>
>>>> -- 
>>>> Ethan R. Davis                                Telephone: (303)  
>>>> 497-8155
>>>> Software Engineer                             Fax:       (303)  
>>>> 497-8690
>>>> UCAR Unidata Program Center                   E-mail:     
>>>> edavis at ucar.edu
>>>> P.O. Box 3000
>>>> Boulder, CO  80307-3000                       http:// 
>>>> www.unidata.ucar.edu/
>>>> ------------------------------------------------------------------- 
>>>> --------
>>>>
>>>>
>>>
>>> = 
>>> Nathan Potter                        ndp at opendap.org
>>> OPeNDAP, Inc.                        541.752.1852
>>>
>>>
>>
>> = 
>> Nathan Potter                        ndp at opendap.org
>> OPeNDAP, Inc.                        541.752.1852
>>
>>
>> ===================================================================== 
>> =========
>> To unsubscribe thredds, visit:
>> http://www.unidata.ucar.edu/mailing-list-delete-form.html
>> ===================================================================== 
>> =========
>
> -- 
> Ethan R. Davis                                Telephone: (303)  
> 497-8155
> Software Engineer                             Fax:       (303)  
> 497-8690
> UCAR Unidata Program Center                   E-mail:     
> edavis at ucar.edu
> P.O. Box 3000
> Boulder, CO  80307-3000                       http:// 
> www.unidata.ucar.edu/
> ---------------------------------------------------------------------- 
> -----
>
>

= 
Nathan Potter                        ndp at opendap.org
OPeNDAP, Inc.                        541.752.1852


==============================================================================
To unsubscribe thredds, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
==============================================================================



More information about the Thredds mailing list