TDS Catalog Configuration Tutorial


THREDDS Catalogs were originally designed for clients to use to access remote data. They have been extended to allow the TDS to use them for its own configuration. In this mode they are called TDS Configuration Catalogs, or server-side Catalogs. They contain information needed only on the server, which is removed when the TDS sends the catalog to the client, called the client-side or client-view catalog.

Explicit Data Roots : DatasetRoot

Revisting our simple example, we modify it to be used as a TDS Configuration Catalog:

    <?xml version="1.0" ?>
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" >
<service name="dodsServer" serviceType="OpenDAP" base="/thredds/dodsC/" />
<datasetRoot path="sage" location="/data/idd/satellite/" /> <dataset name="SAGE III Ozone Loss for Oct 31 2006" serviceName="dodsServer" urlPath="sage/110312006.nc"/>
</catalog>

In this example, the datasetRoot element associates the dataset, whose OpenDAP access URL is /thredds/dodsC/sage/110312006.nc with the file /data/idd/satellite/110312006.nc. It does this by matching the on the path sage. The datasetRoot element is not included in the catalog sent to the client.

Implicit Data Roots : DatasetScan

A datasetScan element is a kind of dataset that automatically scans a local directory and generates datasets for some or all files in it. It contains the same attributes as a datasetRoot element, and implicitly creates a data root, for example

 <service name="myserver" serviceType="OpenDAP" base="/thredds/dodsC" />
 <datasetScan name="My Data" path="myData" location="c:/my/data/" serviceName="myserver" />

The TDS will scan c:/my/data/ and generate a dataset for each file. These generated datasets will have a URL starting with /thredds/dodsC/myData/ and get mapped to c:/my/data/. A datasetScan element is turned into a catalogRef element in the client-side catalog. See here for more details and options on datasetScan elements.

Managing Data Roots

You can have as many datasetRoot and datasetScan elements as you want, for example

  <datasetRoot path="model" location="/data/ncep/" />
  <datasetRoot path="obs" location="/data/raw/metars/" />
<datasetRoot path="cases/001" location="C:/casestudy/data/001/" /> <datasetScan path="myData" location="/data/ncep/run0023" name="NCEP/RUN 23" serviceName="myserver" /> <datasetScan path="myData/gfs" location="/pub/ldm/gfs/" name="NCEP/GFS" serviceName="myserver" />

The datasetRoot and datasetScan are said to define a data root. The rules for data roots:

  1. Each dataset must be associated with a data root, i.e. the beginning part of its path must match a data root path. If there are multiple matches, the longest match is used.
  2. Each data root must have a unique path for all catalogs used by the TDS
  3. The directory pointed to by location should be absolute
  4. The locations may be used in multiple data roots

For example, using the above data roots, the following matches would be made:

urlPath file
model/run0023/mydata.nc
/data/ncep/run0023/mydata.nc
obs/test.nc
/data/raw/metars/test.nc
myData/mydata.nc
/data/ncep/run0023/mydata.nc
myData/gfs/mydata.nc
/pub/ldm/gfs/mydata.nc
cases/001/test/area/two
C:/casestudy/data/001/test/area/two

 

TDS Service elements

The TDS always uses the context name (eg "thredds") and the servlet name (eg "dodsC") for the service base URL. Thus you should always use the following service elements in your TDS configuration catalogs:

OpenDAP server:

  <service name="ncdods" serviceType="OpenDAP" base="/thredds/dodsC/" >

HTTP bulk file server :

  <service name="fileServer" serviceType="HTTPServer" base="/thredds/fileServer/" >

WCS Server :

  <service name="wcsServer" serviceType="WCS" base="/thredds/wcs/" >

NetCDF Subsetting Server :

  <service name="subsetter" serviceType="NetcdfServer" base="/thredds/ncServer/" >

You can use whatever name you choose for the service, it only needs to match the ones used in the dataset serviceName.Note that the base URLs are relative, so your catalogs are independent of your server hostname and port.

Serving the same Dataset in different ways

Since adding metadata to catalogs can be time-consuming, it is convenient to do this in only one place while specifying more than one way to serve the same dataset. To specify more than one way to serve the same dataset, create a Compound service like this:

 <service name="multiple" serviceType="Compound" base="" >
<service name="ncdods" serviceType="OpenDAP" base="/thredds/dodsC/" />
<service name="HTTPServer" serviceType="HTTPServer" base="/thredds/fileServer/" />
<service name="WCS" serviceType="WCS" base="/thredds/wcs/" />
</service>

This defines a compound service with three nested services. Any dataset using this service will have three access URLs, corresponding to the three nested services. For example:

 <dataset name="Model Data" serviceName="multiple" urlPath="models/aug/mydata.nc" />

then the access URLs will be:

  /thredds/dodsC/models/aug/mydata.nc 
/thredds/fileServer/models/aug/mydata.nc
/thredds/wcs/models/aug/mydata.nc

Since the service base is not used in looking for data roots, all three URLs will all be mapped to the same file. For example, if you had the data root

  <datasetRoot path="models" location="/data/ncep/" />

then the file would be:

  /data/ncep/aug/mydata.nc

TDS Root Catalogs

When TDS starts up, it will read the root catalog at ${tomcat_home}/content/thredds/catalog.xml, and recursively read all catalogs that are linked to it through a relative CatalogRef element (i.e. a catalog that lives somewhere under ${tomcat_home}/content/thredds/), and determine the dataset roots and other information.

If you want to serve other catalogs that are not linked to the root catalog, then list them in ${tomcat_home}/content/thredds/threddsConfig.xml, for example:

<catalogRoot location="topcatalog.xml" />
<catalogRoot location="idv/rt-models.1.0.xml" />
<catalogRoot location="cataloggen/catalogs/idv-rt-models.InvCat1.0.xml" />

Note that all catalog filenames must be relative to the ${tomcat_home}/content/thredds/ directory. This feature is new in the 3.14 release.

Summary

1. Always use standard service elements in the TDS

Note that these base URLs are relative, so your catalogs are independent of your server hostname and port.

2. Choose a unique path for each group of datasets you want to serve

For each set of files that you want to serve, name them with a unique path. The path becomes part of the externally visable URL for those files. Choose something that you will easily remember and associate with those files. Make sure that the path is unique for the entire TDS server.

Each set of files must be contained under a single file directory (and its subdirectories). A directory can be associated with more than one path, but each path must be associated with only one directory. Associate the directory with the path either through a datasetScan element or a datasetRoot element:

  1. Use a datasetScan element in the TDS catalog to dynamically generate the catalogs based on whats in the directory. The works best when all the files are the same data type, format etc, especially if they only differ by their time range.
  2. Use a datasetRoot element when you need to generate the catalogs yourself.

3. The Structure of a TDS dataset access URL

Each TDS dataset URL is divided into 5 parts:

For example, assuming that you had a data root defined by:

 <datasetRoot path="model/ncep" location="C:/data/ncep/" />

Then the following URL:

http://hostname:8080/thredds/fileServer/model/ncep/run0023/mydata.nc
<-----server-------><webapp><-servlet-><data-root><--filename------>

would have these 5 parts:

  1. http://hostname:8080 is the server's hostname and port. By using reletive URLs as shown here, you never have to specify this explicitly in your catalogs. This means you can change hosts or ports without having to rewrite your catalogs.
  2. /thredds is the name of the web application, taken from the thredds.war file.
  3. /fileServer maps to the servlet inside the web application, here it would be the FileServer servlet.
  4. /model/ncep is the path, associated with the directory location C:/data/ncep/
  5. /run0023/mydata.nc is the reletive filename, and so is mapped to C:/data/ncep/run0023/mydata.nc.

Links


This document is maintained by John Caron and was last updated on April 11, 2007