Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

THREDDS CatalogGen Configuration Primer



Contents:


A Simple Example

The setup for an example OPeNDAP server:

Here is a simple CatalogGen config document for this server:

1 <?xml version="1.0" encoding="UTF-8"?>
1 <catalog name="My Data"
1 xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
1 xmlns:catGen="http://www.unidata.ucar.edu/namespaces/thredds/CatalogGenConfig/v0.5"
1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
1 xsi:schemaLocation="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0 http://www.unidata.ucar.edu/schemas/thredds/InvCatalog.1.0.xsd"
1 >
1 <dataset name="my collection" dataType="Grid">
2 <metadata metadataType="CatalogGenConfig">
3 <catGen:catalogGenConfig type="Catalog">
4 <catGen:datasetSource name="ds source" type="Local"
4 structure="Flat"
4 accessPoint="/home/www/htdocs/data">
5 <catGen:resultService name="myserver" serviceType="DODS"
5 base="http://www.mydata.org/cgi-bin/dods/nph-dods/"
5 accessPointHeader="/home/www/htdocs/" />
6 <catGen:datasetFilter name="Accept netCDF files only" type="RegExp"
6 matchPattern="/[0-9][^/]*\.nc$"/>
7 <catGen:datasetNamer name="My Model" type="RegExp" addLevel="true"
7 matchPattern="([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])([0-9][0-9])_my.nc$"
7 substitutePattern="My Model $1-$2-$3 $4:00:00 GMT" />
4 </catGen:datasetSource>
3 </catGen:catalogGenConfig>
2 </metadata>
1 </dataset>
1 </catalog>

and a line-by-line eplanation:

  1. All of this is the THREDDS catalog framework for the resulting catalog.
  2. This is the THREDDS catalog metadata element that contains the CatalogGenConfig information and will be replaced by the resulting collection of datasets.
  3. This is the top-level CatalogGenConfig element, it contains information about a single dataset source.
  4. This datasetSource element describes a single dataset source on local disk at "/home/www/htdocs/data".
  5. This resultService element contains information about the service by which the datasets that are found will be accessible. The accessPointHeader attribute provides information on forming the access URL from the local file path. For instance, the file/home/www/htdocs/data/2002081512_my.nc would give a urlPath of "data/2002081512_my.nc".
  6. This datasetFilter element only accepts file that start with a digit and end in ".nc". The README file does not match this filter and therefore is not kept as a dataset.
  7. This datasetNamer element describes how to name a dataset using regular expression matching and substitution. For instance, the file "data/2002081512_my.nc" will result in a dataset with the name "My Model 2002-08-15 12:00:00 GMT".

This example will result in the following catalog:

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="My Data"
xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
xmlns:catGen="http://www.unidata.ucar.edu/namespaces/thredds/CatalogGenConfig/v0.5"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0 http://www.unidata.ucar.edu/schemas/thredds/InvCatalog.1.0.xsd"
>
<service name="myserver" serviceType="DODS"
base="http://www.mydata.org/cgi-bin/dods/nph-dods/" />
<dataset name="my collection" dataType="Grid">
<dataset name="My Model">
<metadata inherit="true">
<serviceName>myserver</serviceName>
</metadata>
<dataset name="My Model 2002/04/16 00:00:00 GMT"
urlPath="model/2002081200_my.nc" />
<dataset name="My Model 2002/04/16 12:00:00 GMT"
urlPath="model/2002081212_my.nc" />
</dataset>
</dataset>
</catalog>

Flat vs Hierarchical Dataset Source

Many dataset collections are organized in a directory hierarchy. The Catalog Generator can either keep that hierarchical structure or flatten it in the resulting catalog. The value of the structure attribute in the datasetSource element determines whether the directory structure is kept ("DirTree") or not ("Flat"). The collection dataset elements

When the directory structure is kept, a collection dataset element is created for each directory and named with the directory path. These collection dataset elements can be re-named by datasetNamer elements.

Example: ???

Creating Hierarchical Structure

The hierarchical structure in your catalog does not have to match the structure of the dataset source. There are currently two ways to modify the organization of the dataset in your catalog. First, if the dataset source already has a hierarchical structure, the existing structure can be removed by setting the structure attribute in the datasetSource element to "Flat". When this is done, all datasets are grouped at the level of the datasetSource element. Second, each datasetNamer element with its addLevel attribute set to "true", creates a collection dataset that contains all the dataset elements that are named by that datasetNamer element. Each of the created collection dataset elements is given the same name as the corresponding datasetNamer element.

Filtering Possible Datasets: datasetFilter and datasetNamer

Many dataset collections contain non-data resources that should not be cataloged. Both the datasetFilter and  the datasetNamer elements can be used to filter out those resources that should not be cataloged. To end up in the catalog, each resource must be accepted by at least one datasetFilter element and one datasetNamer element. All the datasetFilter elements in a datasetSource element are applied to a potential dataset element before the datasetNamer elements are applied.

Converting Old Config Document to Produce an InvCatalog 1.0 Document

There are only three changes needed to convert a CatalogGenConfig document that produces InvCatalog 0.6 documents into one that produces InvCatalog 1.0 documents:
  1. Remove the DOCTYPE statement at the top of the XML document.
  2. Remove the version attribute from the catalog element and add the following XML Namespace declarations:
    xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
    xmlns:catGen="http://www.unidata.ucar.edu/namespaces/thredds/CatalogGenConfig/v0.5"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0 http://www.unidata.ucar.edu/schemas/thredds/InvCatalog.1.0.xsd"
  3. Add the catGen namespace prefix to all the CatalogGenConfig elements. For instance, "datasetSource" becomes "catGen:datasetSource".


A Simple Example To Create An InvCatalog 0.6 Document

The above example and discussion used the InvCatalog 1.0 catalog specification. The Catalog Generator still supports generation of 0.6 catalogs. The use of InvCatalog 1.0 is strongly suggested but there may be times when a 0.6 catalog is still required. Using the same example as above, here is a CatGen Config document that will result in an InvCatalog 0.6 document:

  <?xml version="1.0" encoding="UTF-8"?>
1 <!DOCTYPE catalog SYSTEM "http://www.unidata.ucar.edu/projects/THREDDS/xml/CatalogGenConfig.0.5.dtd">
2 <catalog name="My Data" version="0.6">
<dataset name="my collection" dataType="Grid">
<metadata metadataType="CatalogGenConfig">
3 <catalogGenConfig type="Catalog">
3 <datasetSource name="ds source" type="Local"
structure="flat"
accessPoint="/home/www/htdocs/data">
3 <resultService name="myserver" serviceType="DODS"
base="http://www.mydata.org/cgi-bin/dods/nph-dods/"
accessPointHeader="/home/www/htdocs/" />
3 <datasetFilter name="Accept netCDF files only" type="RegExp"
matchPattern="/[0-9][^/]*\.nc$"/>
3 <datasetNamer name="My Model" type="RegExp" addLevel="true"
matchPattern="([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])([0-9][0-9])_my.nc$"
substitutePattern="My Model $1-$2-$3 $4:00:00 GMT" />
3 </datasetSource>
3 </catalogGenConfig>
</metadata>
</dataset>
</catalog>

There are only a few things that are different from the CatalogGenConfig document that results in an InvCatalog 1.0 document:

  1. The InvCatalog 0.6 documents use XML DTDs so the DOCTYPE statement is required.
  2. The namespace declarations are not required when producing a InvCatalog 0.6 document. The version attribute is required in InvCatalog 0.6 documents (it has been deprecated in the 1.0 specification).
  3. The namespace prefixes must not be used when producing an InvCatalog 0.6 document.

 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Office of Programs University Corporation for Atmospheric Research (UCAR)   Unidata is a member of the UCAR Office of Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690