Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

THREDDS CatalogGen Configuration - Version 0.5

Comments to Ethan Davis or THREDDS mail list


Contents:

Related Information:



Overview

A CatalogGen configuration document is an XML document that describes how to produce a THREDDS catalog by scanning or crawling one or more dataset collections. Each CatalogGen configuration document is a skeleton THREDDS catalog containing one or more metadata elements of type "CatalogGenConfig". Each "CatalogGenConfig" metadata element will be replaced by dataset elements representing the datasets that make up the collection described by that metadata element.

CatalogGenConfig Elements

catalogGenConfig Element

<!ELEMENT catalogGenConfig ( datasetSource )>
<!ATTLIST catalogGenConfig
type (%CatalogGenConfigType;) #REQUIRED
>
<!ENTITY % CatalogGenConfigType "Catalog | Aggregation">

The catalogGenConfig element is the top level element in each "CatalogGenConfig" metadata element. The only value for the type attribute currently supported is "Catalog". So, the value of the type attribute must be "Catalog". It must contain one and only one datasetSource element. For example:

<catGen:catalogGenConfig type="Catalog">
<catGen:datasetSource name="model data source" type="Local"
structure="Flat"
accessPoint="./test/thredds/cataloggen/testData/model">
...
</catGen:datasetSource>
</catGen:catalogGenConfig>

NOTE: A second value of "Aggregation" is defined for the type attribute but is not currently supported. This is a placeholder for when/if the Catalog Generator is expanded to produce configuration files for the DODS Aggregation Server.


datasetSource Element

<!ELEMENT datasetSource ( resultService, datasetFilter*, datasetNamer*)>
<!ATTLIST datasetSource
name CDATA #REQUIRED
type (%DatasetSourceType;) #REQUIRED
structure (%DatasetSourceStructure;) #REQUIRED
accessPoint CDATA #REQUIRED
>
<!ENTITY % DatasetSourceType "Local | DodsDir | DodsFileServer | GrADSDataServer">
<!ENTITY % DatasetSourceStructure "Flat | DirTree">

The datasetSource element describes the source of a dataset collection and how to crawl the collection and create a THREDDS catalog for the collection's datasets. The name of the dataset source is given by the name attribute. The type attribute describes the kind of dataset source being described. The possible values are "Local", for a data collection on local disk and "DodsDir", for a data collection from a remote OPeNDAP/DODS server.  The value of the structure attribute indicates whether any hierarchical directory structure of the dataset source should be duplicated in the resulting catalog ("DirTree") or flattened ("Flat"). The value of the accessPoint attribute is the directory path or URL to the location of the desired datasets. Each datasetSource element  must contain one, and only one, resultService element and may contain one or more datasetFilter elements followed by one or more datasetNamer elements.

NOTE: The two values "DodsFileServer" and "GrADSDataServer" are defined as types but are not currently supported by the catalog generation software.

resultService Element

<!ELEMENT resultService EMPTY>
<!ATTLIST resultService
name CDATA #REQUIRED
serviceType (%ServiceType;) #REQUIRED
base CDATA #REQUIRED
suffix CDATA #IMPLIED
accessPointHeader CDATA #REQUIRED
>

A resultService element provides the details about the service that is serving the datasets from the dataset source. All the dataset elements in the resulting catalog that were added from the dataset source will reference the service described by this resultService element. The name, serviceType, base, and suffix attributes are the attributes of the THREDDS catalog service element (see the THREDDS Inventory Catalog specification). All these attributes are required except for the suffix attribute. The value of the accessPointHeader attribute is used to remove the local part of a datasets path that is not seen by a service. For example, say you have a DODS server serving the data file "/home/htdocs/data/myFile.nc" and "/home/htdocs" is your web servers DocRoot. You could write:

<catGen:datasetSource type="Local" structure="Flat"
accessPoint="/home/htdocs/data">
<catGen:resultService name="myService" serviceType="DODS"
base="http://localhost/cgi-bin/nph-dods/"
accessPointHeader="/home/htdocs" />
</catGen:datasetSource>

The data file would be found at "/home/htdocs/data/myFile.nc" and the accessPointHeader value would be removed from the start of the path resulting in the following dataset element:

<dataset name="" serviceName="myService" urlPath="data/myFile.nc />

datasetFilter Element

<!ELEMENT datasetFilter EMPTY>
<!ATTLIST datasetFilter
name CDATA #REQUIRED
type (%DatasetFilterType;) #REQUIRED
matchPattern CDATA #IMPLIED
matchPatternTarget CDATA #IMPLIED
applyToCollectionDataset (%TrueFalse;) false
applyToAtomicDataset (%TrueFalse;) true
invertMatchMeaning (%TrueFalse;) false
>
<!ENTITY % DatasetFilterType "RegExp">

A datasetFilter element specifies a scheme for filtering datasets. The datasetFilter elements are applied to a resource to determine if it will be added to the dataset collection. If none of the datasetFilter elements accept a given resource it is not added to the collection. This applies to collection (directory) level resources as well. For instance, if there are no filters that apply to collection datasets, the crawling of the datasetSource will not go beyond the top-level.

The name attribute gives the name of the filter. The value of the type attribute must be "RegExp" and indicates that a regular expression is used on the resource to check for a match. The match pattern is given by the value of the matchPattern attribute. The target of the match pattern is given by the matchPatternTargetattribute. (Currently, indicates which attribute of the dataset element the match is to run against, for now either the "name" or "urlPath" attribute. In the future, will also be able to indicate a part of the accessible dataset, e.g., an attribute in a netCDF file.) Whether a filter will be applied to atomic datasets and/or collection datasets is determined by the applyToCollectionDataset and applyToAtomicDataset attributes. The default is to apply only to atomic datasets (leaf-node datasets).

The invertMatchMeaning attribute reverses the meaning of a filter. Normally, if a dataset matches a filter it is accepted as part of the datasetSource collection. However, if the invertMatchMeaning attribute is set to "true", if a dataset matches a filter it is not accepted. This attribute should be used with some care; unless a match is well designed, setting this attribute to "true" can filter out a large number of datasets.

datasetNamer Element

<!ELEMENT datasetNamer  EMPTY>
<!ATTLIST datasetNamer
name CDATA #REQUIRED
addLevel (%TrueFalse;) #REQUIRED
type (%DatasetNamerType;) #REQUIRED
matchPattern CDATA #IMPLIED
substitutePattern CDATA #IMPLIED
attribContainer CDATA #IMPLIED
attribName CDATA #IMPLIED
>
<!ENTITY % DatasetNamerType "RegExp | DodsAttrib">
<!ENTITY % TrueFalse "true | false">

A datasetNamer element specifies a scheme for naming datasets. The datasetNamer elements, in document order, are applied to each dataset until one can be used to name the dataset. If none of the datasetNamer elements can name a dataset, that dataset is removed from the dataset collection. (NOTE: This means that the dataset namers are also dataset filters.)

The name attribute provides the name of the datasetNamer element. When the addLevel attribute is "true", all dataset elements named by the datasetNamer are enclosed in a containing dataset element. The name of the containing dataset element is the name of the datasetNamer element. When the addLevel attribute is set to "false", the dataset elements are added directly to the parent dataset without a new containing dataset element. The value of the type attribute can be either "RegExp" or "DodsAttrib". A "RegExp" type means that a regular expression (the value of the matchPattern attribute) is used to determine if the datasetNamer will be used to name a given dataset. If the regular expression matches the urlPath of the dataset, values found in the match are substituted in the substitution pattern string (the value of the substitutePattern attribute) and the resulting string is used to name the dataset. A type of "DodsAttrib" means that the dataset to be named is checked for a variable (or OPeNDAP/DODS attribute container) with the name given in the attribContainer attribute and then that variable is checked for a variable attribute with the name given by the attribName attribute. If the variable attribute exists, its value is used to name the resulting dataset element.


Index


Ethan Davis
Last modified: Thu Dec 19 17:14:17 MST 2002
 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Office of Programs University Corporation for Atmospheric Research (UCAR)   Unidata is a member of the UCAR Office of Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690