[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CrawlableDatasetDods II



Hi Bas,

I'm going to respond to some questions from several of your earlier emails.

1. How do I decide if a URL is a "Collection" or "Atomic"? It seems I can not count on the "trailing slash", as it is always removed by thredds.crawlabledataset.CrawlableDatasetFactory.normalizePath(). I have a nasty solution for now, that involves checking the URL for known file extensions (like .html, .hdf, .nc, .bz2 etc.). If the extension is not in my list, the URL is a "Collection" and can be crawled further.
Well that is kind of a problem. I was trying to keep the paths nice and clean but there isn't really a good way to tell if an OPeNDAP URL is a collection. Generally, if they end in "/" they are collections but my cleaning of the paths screws that up. One problem is that the OPeNDAP spec doesn't define the dods_dir response as well as it could. Which leads to another problem, different OPeNDAP server implementations deal with dods_dir a bit differently. But if I recall correctly the servers you are looking at are both OPeNDAP C++ servers.

I would stick with the extension test and maybe add a test to see if the url is a real OPeNDAP dataset by adding the ".dds" extension and seeing if the value of the HTTP header "Content-Description" is "dods_dds" or "dods_error". If it is "dods_dds", you don't have a collection.

Sorry this isn't a pretty solution. I'll have to rethink the normalize stuff but that may be awhile.

One thing I'm sorry I didn't mention earlier and depending on your time frame. The OPeNDAP folks are working on the Server 4 architecture which includes automatic generation of THREDDS catalogs. I'm not sure what their time frame is or what the time frame would be for various server sites to upgrade. But thought I should mention this.

2. I am using thredds.cataloggen.config.DodsURLExtractor (like the original code from thredds.cataloggen.config.DodsDirDataSource, from which I have used parts). You had mentioned that you do not like this very much. However, it works well. Can I keep using this? Or did you have something else in mind?
I don't have anything else in mind. Please feel free to continue using.

3. In thredds.examples.MockOpendapDSP, there is an assumption that only CrawlableDatasetFile exists (in other cases an SC_INTERNAL_SERVER_ERROR is generated). Considering the java-package, this seems just example code, so the problem is not critical. Do you think there are other places where a CrawlableDataset other than CrawlableDatasetFile is unexpected?
That is correct, the MockOpendapDSP is just example code.

In terms of generating catalogs, I believe there shouldn't be any other locations where CrawlableDatasetFile is assumed.

In terms of serving datasets from the TDS, we do currently assume CrawlableDatasetFile in some places (but do plan on getting rid of that assumption). I had been assuming that you would be setting things up so that the generated catalogs would point to the remote server for OPeNDAP access. Do I have that wrong?

I have looked a bit further for possible mistakes in my Java code and in my catalog.xml . In the tag <serviceName> of <datasetScan> I have seen that I need to put something useful. However, I did not discover yet how this field is used, thus whether I should choose a Compound, OpenDAP or HTTPServer service, or something else, or that it does not matter as long as it is != null.
This kind of relates to my comment just above about how you want your catalog to point to the dataset. The <serviceName> needs to reference a <service> that is at the top of your config catalog. The content of <serviceName> must match an existing <service> element name attribute, e.g., <service name="remoteOPeNDAP" ... /> ... <serviceName>remoteOPeNDAP</serviceName>. The service that is referenced gives information about how to access datasets, the base URL is combined with the dataset urlPath. Here's a reference on how to build access URLs from THREDDS catalogs, http://www.unidata.ucar.edu/projects/THREDDS/tech/catalog/InvCatalogSpec.html#constructingURLs.


Unfortunately, so far I have no progress. I get the same problem as last Friday, which is that I can not get my new class CrawlableDatasetDods to be called.
If you have remote management set up on your TDS (http://motherlode.ucar.edu:8080/thredds/docs/RemoteManagement.html), you can go to the http://your.server:port/thredds/debug and set the log4j logging levels for select packages. You might bump all the thredds.cataloggen stuff up to "ALL" and see if you get any hints.

I'm out of the office again (sorry, kind of a hectic summer vacation schedule :-) until next Wednesday. Feel free to send more questions. Just wanted to let you know my response might be a bit delayed.

Enjoy your weekend.

Ethan

--
Ethan R. Davis                                Telephone: (303) 497-8155
Software Engineer                             Fax:       (303) 497-8690
UCAR Unidata Program Center                   E-mail:    address@hidden
P.O. Box 3000
Boulder, CO  80307-3000                       http://www.unidata.ucar.edu/
---------------------------------------------------------------------------