[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20050908: Newbie problems with catalog generator



Sounds like a deal. Glad your script is getting the job done.

Ethan

Tennessee Leeuwenburg wrote:

That's fine. Perhaps after the next release, I can offer my services in helping to update, create and identify gaps in the available documentation. In return, you guys can document the bits I don't understand, and we all win! :)

In the meantime, my python script does the job fine.

Cheers,
-T

Hi Tennessee,

Sorry for the confusion. Part of the problem is that we are just getting our release engineering and versioning figured out. So there are several different servers, versions, and config file formats being used. We finally have that pretty well figured out in the upcoming release of the THREDDS Data Server (TDS). But that doesn't help current installations.

Yes, the idea for the catalog generator is to crawl whatever source of data you have (local or remote) and generate a catalog. Sorry that wasn't as easy as it should have been.

Do you have a working config file now from your python script? Let me know if I can be of any help.

Sorry again for the confusion.

Ethan

Tennessee Leeuwenburg wrote:

I think there may be a fundamental mis-understanding here. The server is not setup to serve this data. I want to produce the config file which will enable the server to serve the data. (i.e. catalogConfig.xml).

I have actually now written a python script to do this for me, as it seemed easier.

I thought that the idea behind the catalog generator wasn't just to produce catalogs for data resourced from other dods servers, but also resourced from local disk etc...

Cheers,
-T

Hi Tennessee,

Is the OPeNDAP server already setup to serve this data? What do you get with the URL http://localhost:8010/thredds/dodsC/catalog.xml? The current THREDDS server, when setup to serve data via OPeNDAP, automatically generates a catalog of the data being served (so you may not need to use CatGen). I think the version from two months ago should also do that. Could you look at the manifest.mf file in the thredds.war file you are using and let me know what the "Built-By", Built-On", Implementation-Title", and "Implementation-Version" values are?

As for the CatGen stuff, try changing the address@hidden value to "/data/pymars" (matching the value in accessPoint) and the address@hidden value to ".*\.nc$" and for now remove the datasetNamer element.Oh yeah, you'll probably need to change the address@hidden value to "http://localhost:8010/thredds/dodsC/";.

Hope that helps,

Ethan

Tennessee Leeuwenburg wrote:

Hi Ethan,

I'm not sure I fully grokked what you said to me, so I've just included my catalog generator file without further modification.

I have data living on disk in /data/pymars/2004/netcdf_anal, and /data/pymars/2004/netcdf_fore. I would like to set up the catalog generator to crawl the /data/pymars directory and publish what it find there -- no requirement for very intelligent structuring at this stage.

The dods server is running on localhost:8010.

I'm not entirely certain what version is running, but it is whatever is current on the web page as of about 2 months ago. I look forward to the new version, and the simpler configuration!

I wasn't sure what I had to do with all that pattern matching stuff, so I decided to just leave it unchanged from the example, and just see what happened. I imagine I have to replace the datasetFilter to accept *.nc, or some other pattern of my choosing. I couldn't work out if the dataset namer was mandatory or not. I'd really just like to capture everything, and am happy with the title being the filename at this stage.

Cheers,
-Tennessee

<?xml version="1.0" encoding="UTF-8"?>
<!-- $Id: catGenConf.exampleLocal.xml,v 1.2 2004/06/03 20:38:07 edavis Exp $ -->
<!-- - Simple example CatalogGenConfig file.
-->
<!DOCTYPE catalog SYSTEM "http://www.unidata.ucar.edu/projects/THREDDS/xml/CatalogGenConfig.0.5.dtd";>


<catalog name="THREDDS CatalogGen test config file" version="0.6">
<dataset name="THREDDS CatalogGen test config file">
<dataset name="NCEP Eta 80km CONUS model data">
<metadata metadataType="CatalogGenConfig">
<catalogGenConfig type="Catalog">
<datasetSource name="Local Disk Data Sets" type="Local"
structure="DirTree"
accessPoint="/data/pymars">
<resultService name="linuxdev" serviceType="DODS"
base="http://localhost:8010/thredds/cataloggen/";
accessPointHeader="/home/tjl/jakarta-5.0.28/content/thredds/cataloggen/"/>


<datasetFilter name="Accept netCDF files only" type="RegExp"
matchPattern="/[0-9][^/]*_eta_211\.nc$"/>
<datasetNamer name="NCEP Eta 80km CONUS model data"
type="RegExp" addLevel="false"
matchPattern="([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])([0-9][0-9])_eta_211.nc$"


substitutePattern="NCEP Eta 80km CONUS $1-$2-$3 $4:00:00 GMT"/>
</datasetSource>
</catalogGenConfig>
</metadata>
</dataset>
<dataset name="NCEP GFS 80km CONUS model data">
<metadata metadataType="CatalogGenConfig">
<catalogGenConfig type="Catalog">
<datasetSource name="model data source" type="Local"
structure="Flat"
accessPoint="./content/thredds/cataloggen/testData/model">
<resultService name="mlode" serviceType="DODS"
base="http://localhost:8080/thredds/cataloggen/";
accessPointHeader="./content/thredds/cataloggen/"/>
<datasetFilter name="Accept netCDF files only" type="RegExp"
matchPattern="/[0-9][^/]*_gfs_211\.nc$"/>
<datasetNamer name="NCEP GFS 80km CONUS model data"
type="RegExp" addLevel="false"
matchPattern="([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])([0-9][0-9])_gfs_211.nc$"


substitutePattern="NCEP GFS 80km CONUS $1-$2-$3 $4:00:00 GMT"/>
</datasetSource>
</catalogGenConfig>
</metadata>
</dataset>
</dataset>
</catalog>





Ethan Davis wrote:

Tennessee Leeuwenburg wrote:

Ethan Davis wrote:



Hi Tennessee,

Did you edit the config.xml file (which sets up the tasks) as well as
the cat gen config file? I guess you must have if it is showing up in
the interface. Make sure the period value is not set to zero; if it
is, the task won't be run. Are you getting any messages in the log
files? What version of the server are you running? Is this a publicly
available server? If so, send me the URL and I'll take a look at the
config files.


Sorry these config file formats are so ugly. We're working on
simplifying and cleaning up the configuration throughout the server.
But for now ...






Well, as long as you're willing to help me, ugly is fine :)


More than willing to help. But I want simpler because it would make it easier for me to remember what is going on :)

After making that change, the server started to process the various
files. The exampls DODS catalog was generated fine, the example
filesystem catalog and my own filesystem catalog both failed with
similar messages. I've appended the results.

I think I'm failing to understand what exactly the serviceName, base and
accessPointHeader are actually used for.


As with regular catalogs, I assume one is used for reconstructing the
URL to the file to be resourced, and the other is used for constructing
the URL to be used in an OpenDAP request, but it's not clear to me
exactly what is happened. I read the documentation, but it was a bit
hand-wavy about the specifics.


The accessPoint is the directory that is to be scanned for data files. The accessPointHeader is a parent directory of the accessPoint directory and is used to remove the part of the data file path that is not to appear in the resulting dataset access URL. The base value is the URL for the OPeNDAP server that is serving your data. For instance, if you want to crawl the /my/data/radar/level3/FTG directory and a resulting dataset access URL is something like http://.../nph-dods/radar/level3/FTG/file.nc, you would want something like

<datasetSource name="model data source" type="Local" structure="Flat"
accessPoint="/my/data/radar/level3/FTG">
<resultService name="mlode" serviceType="DODS" base="http://.../nph-dods/";
accessPointHeader="/my/data/"/>
<datasetFilter ... />
<datasetNamer ... />
</datasetSource>


Does that clear things up at all? If not, feel free to send me your config file to look at.

Sorry about the documentation. It isn't all that clear and I haven't put much effort into it since we decided to move to a simpler config file format. Not sure what's up below with the example file system dataset. I must have broken something at some point.

What version of the cat gen servlet (or THREDDS server) are you running?

Ethan

PS In the new TDS, catalogs for the data it is serving are automatically generated and the config files are much simpler than these.


Thanks for your help,
-T

<catalog name="THREDDS CatalogGen test config file" version="0.6">
â
<dataset name="THREDDS CatalogGen test config file">
<service name="linuxdev" serviceType="DODS"
base="http://localhost:8010/thredds/cataloggen/"/>
<service name="mlode" serviceType="DODS"
base="http://localhost:8080/thredds/cataloggen/"/>
â
<dataset name="NCEP Eta 80km CONUS model data">
<dataset name="The DatasetSource "Local Disk Data Sets" could not be
expanded. The accessPointHeader
(/home/tjl/jakarta-5.0.28/content/thredds/cataloggen/) is not a
directory." serviceName="linuxdev"/>
</dataset>
â
<dataset name="NCEP GFS 80km CONUS model data">
<dataset name="The DatasetSource "model data source" could not be
expanded. The accessPointHeader (./content/thredds/cataloggen/) is not a
directory." serviceName="mlode"/>
</dataset>
</dataset>
</catalog>







-- Ethan R. Davis Telephone: (303) 497-8155 Software Engineer Fax: (303) 497-8690 UCAR Unidata Program Center E-mail: address@hidden P.O. Box 3000 Boulder, CO 80307-3000 http://www.unidata.ucar.edu/ ---------------------------------------------------------------------------



NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.