Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

TDS catalog element Specification, Version 1.0.1

last update: 3 October 2006
Comments to THREDDS mailgroup

The THREDDS Data Server (TDS) uses specialized catalogs as configuration documents. Several elements have been added to the InvCatalog schema to allow for this server-side usage.

This document specifies the semantics and XML representation of the server-side specializations allowed in THREDDS catalogs.

Contents:

  1. Server-side Elements
  2. Index
  3. Change History

Related resources:

Change History:


Server-side Elements

datasetScan Element

<xsd:element name="datasetScan" substitutionGroup="dataset">
<xsd:complexType>
<xsd:complexContent>
<xsd:extension base="DatasetType">
<xsd:sequence>
<xsd:element ref="filter" minOccurs="0" />
<xsd:element ref="addID" minOccurs="0" />
<xsd:element ref="namer" minOccurs="0" />
<xsd:element ref="sort" minOccurs="0" />
<xsd:element ref="addLatest" minOccurs="0" />
<xsd:element ref="addProxies" minOccurs="0" />
<xsd:element name="addDatasetSize" minOccurs="0" />
<xsd:element ref="addTimeCoverage" minOccurs="0" />
</xsd:sequence>


<xsd:attribute name="path" type="xsd:string" use="required"/>
<xsd:attribute name="location" type="xsd:string"/>
<xsd:attribute name="dirLocation" type="xsd:string"/> <!-- deprecated : use location attribute -->
<xsd:attribute name="filter" type="xsd:string"/> <!-- deprecated : use filter element -->
<xsd:attribute name="addDatasetSize" type="xsd:boolean"/> <!-- deprecated : use enhance/addDatasetSize element -->
<xsd:attribute name="addLatest" type="xsd:boolean"/> <!-- deprecated : use addLatest element -->
<xsd:attribute name="addId" type="xsd:boolean"/> <!-- deprecated : use addID element -->
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
</xsd:element>

The datasetScan element allows for the generation of nested THREDDS catalogs by scanning the dataset collection location named in the location attribute. The path attribute is used to map dataset and catalog requests to a given datasetScan.

A datasetScan element is in the dataset substitutionGroup, so it can be used wherever a dataset element can be used. It is an extension of a DatasetType, so any of dataset's nested elements and attributes can be used in it. This allows you to add enhanced metadata to a datasetScan. However you should not add nested datasets, as these will be ignored.

By default, each generated catalog will include all datasets at the requested level of the given dataset collection location. Each collection (directory) dataset will be included as a catalogRef element and each atomic (file) dataset will be included as a dataset element. The name of the resulting dataset or catalogRef will be the name of the corresponding dataset. No metadata will be added other than that contained in the datasetScan element which will be added as appropriate at the different level of the given dataset collection location depending on if it is inherited metadata or not.

The datasetScan specific nested elements (filter, addID, namer, sort, addLatest, addProxies, addDatasetSize, and addTimeCoverage) can be used to modify the default behavior or add metadata.

This very simple example:

<datasetScan name="GRIB2 Data" path="grib2" location="C:/data/grib2/" >
<dataFormat>GRIB-2</dataFormat>
</datasetScan >

Might result in the following catalog:

<catalog ...>
<service name="myserv" ... />
<dataset name="GRIB2 Data">
<metadata inherited="true"><serviceName>myserv</serviceName></metadata>
<dataset name="data1.wmo" urlPath="data1.wmo" />
<dataset name="data2.wmo" urlPath="data2.wmo" />
<dataset name="readme.txt" urlPath="readme.txt" />
<catalogRef xlink:title="test" xlink:href="test" name="" />
</dataset>
</catalog>

filter Element

<xsd:element name="filter">
<xsd:complexType>
<xsd:choice>
<xsd:sequence minOccurs="0" maxOccurs="unbounded">
<xsd:element name="include" type="FilterSelectorType" minOccurs="0"/>
<xsd:element name="exclude" type="FilterSelectorType" minOccurs="0"/>
</xsd:sequence>
</xsd:choice>
</xsd:complexType>
</xsd:element>

<xsd:complexType name="FilterSelectorType">
<xsd:attribute name="regExp" type="xsd:string"/>
<xsd:attribute name="wildcard" type="xsd:string"/>
<xsd:attribute name="atomic" type="xsd:boolean"/>
<xsd:attribute name="collection" type="xsd:boolean"/>
</xsd:complexType>

The filter element allows users to specify which datasets are to be included in the generated catalogs. A filter element can contain any number of include and exclude elements. Each include or exclude element may contain either a wildcard or a regExp attribute. If the given wildcard pattern or regular expression matches a dataset name, that dataset is included or excluded as specified. By default, includes and excludes apply only to atomic datasets (regular files). You can specify that they apply to atomic and/or collection datasets (directories) by using the atomic and collection attributes.  or a specify either a wildcard pattern or a regular expression pattern with which a dataset name is matched. They can also specify whether they apply to atomic and/or collection datasets (the default is to apply to atomic datasets only).

Expanding on the above example:

<datasetScan name="GRIB2 Data" path="grib2" location="C:/data/grib2/" >
<dataFormat>GRIB-2</dataFormat>
<filter>
<include wildcard="*.wmo" />
</filter>
</datasetScan >
results in:
<catalog ...>
<service name="myserv" ... />
<dataset name="GRIB2 Data">
<metadata inherited="true"><serviceName>myserv</serviceName></metadata>
<dataset name="data1.wmo" urlPath="data1.wmo" />
<dataset name="data2.wmo" urlPath="data2.wmo" />
</dataset>
</catalog>

More examples are available in the TDS datasetsScan documentation.

addID Element

<xsd:element name="addID" />

The addID element specifies that a datasetScan should add an ID attribute to each dataset element included in a resulting catalog.

The TDS adds ID attributes by default even if no addID element is given in the datasetScan. The IDs are constructed by concatenating the relative path of the generated dataset to either the datasetScan ID (if it exists) or the datasetScan path.

So the example results from the filter section above would more accurately be:

<catalog ...>
<service name="myserv" ... />
<dataset name="GRIB2 Data" ID="grib2">
<metadata inherited="true"><serviceName>myserv</serviceName></metadata>
<dataset name="data1.wmo" ID="grib2/data1.wmo" urlPath="data1.wmo" />
<dataset name="data2.wmo" ID="grib2/data2.wmo" urlPath="data2.wmo" />
</dataset>
</catalog>

namer Element

<xsd:element name="namer">
<xsd:complexType>
<xsd:choice maxOccurs="unbounded">
<xsd:element name="regExpOnName" type="NamerSelectorType"/>
<xsd:element name="regExpOnPath" type="NamerSelectorType"/>
</xsd:choice>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="NamerSelectorType">
<xsd:attribute name="regExp" type="xsd:string"/>
<xsd:attribute name="replaceString" type="xsd:string"/>
</xsd:complexType>

The namer element specifies one or more methods for renaming resulting dataset and catalogRef elements. Currently, two methods for renaming are available. Both methods use regular expression matching and capturing group replacement to determine the new name. The first method, specified by the regExpOnName element, does regular expression matching on the dataset name. The second method, specified by the regExpOnPath element, does regular expression matching on the entire dataset path. In either method, the regExp attribute contains the regular expression used in matching on the name or path and the replaceString attribute contains the replacement string on which capturing group replacement is performed.

A capturing group is a part of a regular expression enclosed in parenthesis. When a regular expression with a capturing group is applied to a string, the substring that matches the capturing group is saved for later use. The captured strings can then be substituted into another string in place of capturing group references,"$n", where "n" is an integer indicating a particular capturing group. (The capturing groups are numbered according to the order in which they appear in the match string.) For example, the regular expression "Hi (.*), how are (.*)?" when applied to the string "Hi Fred, how are you?" would capture the strings "Fred" and "you". Following that with a capturing group replacement in the string "$2 are $1." would result in the string "you are Fred."

Here's an example namer:

<namer>
<regExpOnName regExp="([0-9]{4})([0-9]{2})([0-9]{2})_([0-9]{2})([0-9]{2})"
replaceString="NCEP GFS 191km Alaska $1-$2-$3 $4:$5:00 GMT"/>
</namer

the regular expression has five capturing groups

  1. The first capturing group, "([0-9]{4})",  captures four digits, in this case the year.
  2. The second capturing group, "([0-9]{2})", captures two digits, in this case the month.
  3. The third capturing group, "([0-9]{2})", captures two digits, in this case the day of the month.
  4. The fourth capturing group, "([0-9]{2})", captures two digits, in this case the hour of the day.
  5. The fifth capturing group, "([0-9]{2})", captures two digits, in this case the minutes of the hour.
When applied to the dataset name "GFS_Alaska_191km_20051011_0000.grib1",  the strings "2005", "10", "11", "00", and "00" are captured. After replacing the capturing group references in the replaceString attribute value, we get the name "NCEP GFS 191km Alaska 2005-10-11 00:00:00 GMT". So, when cataloged, this dataset would end up as something like this:
<dataset name="NCEP GFS 191km Alaska 2005-10-11 00:00:00 GMT"
 urlPath="models/NCEP/GFS/Alaska_191km/GFS_Alaska_191km_20051011_0000.grib1"/>

sort Element

<xsd:element name="sort">
<xsd:complexType>
<xsd:choice>
<xsd:element name="lexigraphicByName">
<xsd:complexType>
<xsd:attribute name="increasing" type="xsd:boolean"/>
</xsd:complexType>
</xsd:element>
</xsd:choice>
</xsd:complexType>
</xsd:element>

Without a sort element, datasets at each collection level are listed in their "natural" order. The sort element specifies how to order  those datasets. Currently, a sort element can only contain one lexigraphicByName element which indicates that datasets should be ordered lexigraphically according to the dataset name. The increasing attribute in the lexigraphicByName element indicates whether the datasets should in increasing or decreasing order.

addLatest Element

<xsd:element name="addLatest">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="simpleLatest" minOccurs="0">
<xsd:complexType>
<xsd:attribute name="name" type="xsd:string"/>
<xsd:attribute name="top" type="xsd:boolean"/>
<xsd:attribute name="serviceName" type="xsd:string"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>

The addLatest element is deprecated in favor of the addProxies element.

addProxies Element

<xsd:element name="addProxies">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="simpleLatest" minOccurs="0">
<xsd:complexType>
<xsd:attribute name="name" type="xsd:string"/>
<xsd:attribute name="top" type="xsd:boolean"/>
<xsd:attribute name="serviceName" type="xsd:string"/>
</xsd:complexType>
</xsd:element>
<xsd:element name="latestComplete" minOccurs="0">
<xsd:complexType>
<xsd:attribute name="name" type="xsd:string"/>
<xsd:attribute name="top" type="xsd:boolean"/>
<xsd:attribute name="serviceName" type="xsd:string"/>
<xsd:attribute name="lastModifiedLimit" type="xsd:float"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>

The addProxies element provides a place for describing proxy datasets to be added to each collection dataset under a datasetScan.

Currently, two types of proxy datasets are supported. They are both intended to proxy the "latest" dataset in the scanned collection. The first type of proxy dataset, specified by the simpleLatest element, adds a dataset that proxies the existing dataset whose name is lexigraphically greatest. This dataset will be the "latest" if the dataset name contains a timestamp. The simpleLatest element may contain a name attribute which is used as the name of the proxy dataset, the serviceName attribute that references the service element that is to be referenced by the resulting proxy dataset, and the top attribute which indicates if the proxy dataset should appear at the top or bottom of the list of dataset in this collection. Default behavior in the TDS if any these attributes are missing is to name the dataset "latest.xml", reference the "latest" service, and place the dataset at the top of the collection.

The second type of proxy dataset, specified by the latestComplete element, is the same as the simple latest except that it will exclude any dataset that was last modified within the number of minutes specified by the lastModifedLimit attribute. It must contain all the attributes allowed in the simpleLatest element plus the lastModifiedLimit attribute.

An example is available in the TDS datasetsScan documentation.

addTimeCoverage Element

  <xsd:element name="addTimeCoverage">
<xsd:complexType>
<xsd:attribute name="datasetNameMatchPattern" type="xsd:string"/>
<xsd:attribute name="startTimeSubstitutionPattern" type="xsd:string"/>
<xsd:attribute name="duration" type="xsd:string"/>
</xsd:complexType>
</xsd:element>

The addTimeCoverage element indicatest that a THREDDS timeCoverage element should be added to each atomic dataset cataloged by the containing datasetScan element and describes how to determine the time coverage for each datasets in the collection.

Currently, the addTimeCoverage element can only describe one method for determining the time coverage of a dataset. The  datasetNameMatchPattern attribute is used in a regular expression match on the dataset name. If the match succeeds, a capturing group replacement is performed on the startTimeSubstitutionPattern attribute and the result is the start time string (see the namer element description, above, for more on regular expressions and capturing groups). The time coverage duration is given by the duration attribute.

Example:

  <datasetScan name="My Data" path="myData" location="c:/my/data/"> 
<serviceName>myserver</serviceName>
<addTimeCoverage datasetNameMatchPattern="([0-9]{4})([0-9]{2})([0-9]{2})([0-9]{2})_gfs_211.nc$"
startTimeSubstitutionPattern="$1-$2-$3T$4:00:00"
duration="60 hours" />
</datasetScan>

for the dataset named "2005071812_gfs_211.nc", results in the following timeCoverage element:

  <timeCoverage>
<start>2005-07-18T12:00:00</start>
<duration>60 hours</duration>
</timeCoverage>

addDatasetSizeElement

  <xsd:element name="addDatasetSize" />

The addDatasetSize element indicates that file size metadata in the form of a dataSize element should be added to all atomic datasets.

An example is available in the TDS datasetsScan documentation.

datasetRoot Element

  <xsd:element name="datasetRoot">
<xsd:complexType>
<xsd:attribute name="path" type="xsd:string" use="required"/>
<xsd:attribute name="location" type="xsd:string" use="required"/>
</xsd:complexType>
</xsd:element>

The datasetRoot element, similar to the datasetScan element, maps request URLs to dataset collection locations. The difference is that a datasetRoot does not perform any scans or generate any catalogs. It simply allows users to specify individual datasets from the datasetRoot location.

Example:

<datasetRoot path="dsR1" location="C:/data/mydata/" />
...
<dataset name="dataset 1" urlPath="data1.nc" />



Index


 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Office of Programs University Corporation for Atmospheric Research (UCAR)   Unidata is a member of the UCAR Office of Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690