THREDDS Catalog Primer

last update: 21 January 2009


Introduction

THREDDS catalogs collect, organize, and describe accessible datasets. They provide a hierarchical structure for organizing the datasets as well as an access method (URL) and a human understandable name for each dataset. Further descriptive information about each dataset can also be added.

Example Catalog

  <?xml version='1.0' encoding='UTF-8'?>
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="1.0.2">
1) <service name="odap" serviceType="OPENDAP" base="/thredds/dodsC/" />
<dataset name="TDS Tutorial: example 1">

2) <dataset name="TDS Tutorial: example data 1" urlPath="test/example1.nc" >
3) <serviceName>odap</serviceName>
</dataset>
<dataset name="TDS Tutorial: example data 2" urlPath="test/example2.nc" >
<serviceName>odap</serviceName>
</dataset>
<dataset name="TDS Tutorial: example data 3" urlPath="test/example3.nc" >
<serviceName>odap</serviceName>
</dataset>
4) <catalogRef xlink:title="My Other Catalog"
xlink:href="myOtherCatalog.xml" />
<catalogRef xlink:title="Far Away Univ catalog"
xlink:href="http://www.farAwayU.edu/thredds/catalog.xml" />
</dataset>
</catalog>

Notes:

  1. The service element (1) defines an OPeNDAP service and has the name "odap".
  2. The first dataset is a container dataset.
  3. Each child dataset has an access method due to its urlPath attribute (2) and child serviceName element (3).
  4. The catalogRef elements (4) link to a relative catalog and a remote catalog.

Constructing an access URL

  1. Find the service element referenced by the dataset:
    <service name="odap" serviceType="OPENDAP" base="/thredds/dodsC/" />
  2. Get the base URL of the service element:
    /thredds/dodsC/
  3. If it is a relative URL, resolve against the catalog's URL:
    http://my.server/thredds/catalog.xml      // Catalog URL
    http://my.server/thredds/dodsC/           // Service base URL
  4. Find the urlPath attribute for the accessible dataset:
    urlPath="test/example3.nc"
  5. Append the value of the urlPath attribute to the base service URL:
    http://my.server/thredds/dodsC/test/example3.nc

More information is available on constructing access URLs.

Catalog References

It can be useful to break up large catalogs into pieces in order to separately maintain each piece. One way to do this is to build each piece as a separate and logically complete catalog, then create a master catalog using catalog references:

<catalog name="master"
         xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
1)       xmlns:xlink="http://www.w3.org/1999/xlink" >

    <dataset  name="List of THREDDS catalogs">
2)      <catalogRef xlink:title="My Number One catalog"
3)                  xlink:href="myCatalog1.xml"/>
        <catalogRef xlink:title="My Number Two catalog"
                    xlink:href="myCatalog2.xml"/>
        <catalogRef xlink:title="Home Away University catalog"
                    xlink:href="http://www.homeAwayU.edu/thredds/catalog.xml"/>
        <catalogRef xlink:title="Far Away University catalog"
                    xlink:href="http://www.farAwayU.edu/thredds/catalog.xml"/>
    </dataset>
</catalog>

In this example we have several catalogRef elements, the first two link to local catalogs. The second two link to remote catalogs. The value of the xlink:href attribute (3) provides the referenced URI, relative or absolute. The value of the xlink:title attribute (2) is used as the name of the dataset.Notice that we must declare the xlink namespace in the catalog element (1).

Example Catalog - shared service name

  <?xml version='1.0' encoding='UTF-8'?>
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  version="1.0.2">
<service name="odap" serviceType="OPENDAP" base="/thredds/dodsC/" />
<dataset name="TDS Tutorial: example 2">
<metadata inherited="true">
2) <serviceName>odap</serviceName>
</metadata>

3) <dataset name="TDS Tutorial: example data 1" urlPath="test/example1.nc" />
3) <dataset name="TDS Tutorial: example data 2" urlPath="test/example2.nc" />
3) <dataset name="TDS Tutorial: example data 3" urlPath="test/example3.nc" />

<catalogRef xlink:title="My Other Catalog"
  xlink:href="myOtherCatalog.xml" />
<catalogRef xlink:title="Far Away Univ catalog"
  xlink:href="http://www.farAwayU.edu/thredds/catalog.xml" />
</dataset>
</catalog>

Notes:


More details on service Elements

service Element Names must be Unique in Each Catalog

Within a catalog, the service name is used to reference a service element. The service names must therefore be unique in each catalog.

Compound service Elements - Serving Datasets with Multiple Methods

Datasets can be made available through more than one access method by defining and then referencing a compound service element. The following:

<service name="all" serviceType="Compound" base="" >
<service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
<service name="wcs" serviceType="WCS" base="/thredds/wcs/" />
</service>

defines a compound service named "all" which contains two nested services. Any dataset that reference the compound service will have two access methods. For instance:

<dataset name="cool data" urlPath="so/cool/data.nc" >
<serviceName>all</serviceName>
</dataset>

would result in these two access URLs:

/thredds/dodsC/so/cool/data.nc
/thredds/wcs/so/cool/data.nc

Note: The contained services can still be referenced independently. For instance:

<dataset name="more cool data" urlPath="more/cool/data.nc" >
<serviceName>odap</serviceName>
</dataset>

results in a single access URL:

/thredds/dodsC/more/cool/data.nc

Note:


THREDDS Metadata

Linking to Metadata

<metadata xlink:title="some good metadata" xlink:href="http://my.server/md/data1.xml" />

Linking to Human Readable Metadata

<description xlink:title="My Data" xlink:href="http://my.server/md/data1.html" />

More dataset information

There's a lot of other information that can be optionally added that helps applications and digital libraries know how to "do the right thing" with the dataset. The collectionType attribute is used on collection datasets. The dataType is a simple classification (eg Image, Grid, Point data, etc). The dataFormatType describes what format the data is stored in (eg NetCDF, HDF5, etc) used by a file transfer protocol like FTP. The combination of the naming authority and the ID attribute should form a globally unglue identifier for a dataset.

<dataset name="SAGE III Ozone Loss Experiment" collectionType="TimeSeries">
<dataset name="January Averages" serviceName="aggServer" urlPath="sage/avg/jan.nc" authority="unidata.ucar.edu" ID="sage-20938483">
<dataType>Trajectory</dataType>
<dataFormatType>NetCDF</dataFormatType>
</dataset> </dataset>

The harvest attribute indicates that the dataset is at the right level of granularity to be exported to search systems like Digital Libraries. Elements such as summary, rights, publisher are needed in order to create valid entries for these services. For more details, see Exporting THREDDS Datasets to Digital Libraries. Also see the Catalog Specification as a complete reference.

<dataset name="SAGE III Ozone Loss Experiment" harvest="true">
<contributor role="data manager">John Smith</contributor>
<keyword>Atmospheric Chemistry</keyword>
<publisher>
<name vocabulary="DIF">Community Data Portal, National Center for Atmospheric Research, University Corporation for Atmospheric Research</long_name>
<contact url="http://dataportal.ucar.edu" email="cdp@ucar.edu"/>
</publisher>
</dataset>

Factoring out information

Rather than declare the same information on each dataset, you can use the metadata element to factor out common information.:

<dataset name="SAGE III Ozone Loss Experiment" >

1 <metadata inherit="true">
2 <serviceName>aggServer</serviceName>
2 <dataType>Trajectory</dataType>
2 <dataFormatType>NetCDF</dataFormatType>
2 <authority>unidata.ucar.edu</authority>
</metadata> 3 <dataset name="January Averages" urlPath="sage/avg/jan.nc" ID="sage-23487382"/> 3 <dataset name="February Averages" urlPath="sage/avg/feb.nc" ID="sage-63656446"/> 4 <dataset name="Global Averages" urlPath="sage/global.nc" ID="sage-7869700g" dataType="Grid"/>

</dataset>
  1. The metadata element with inherit=true implies that all the information inside the metadata element applies to the current dataset and all nested datasets.
  2. The serviceName, dataType, dataFormatType and authority are declared as elements.
  3. These datasets now use the serviceName, dataType, dataFormatType and authority values declared in the parent dataset.
  4. This dataset uses the serviceName, dataFormatType and authority values and overrides the dataType.

More Advanced Topics

XML Namespaces and Validation

If you use elements from other namespaces, you must declare those namespaces in the catalog element. Currently there are two other namespaces THREDDS libraries will recognize, Dublin Core, and XLink, whose namespaces look like:

<catalog name="MyName"
xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xlink="http://www.w3.org/1999/xlink" >

Its not obvious, but namespaces are not web addresses, they are just strings that need to be copied exactly as you see them here.

As catalogs get more complicated, you should check that you haven't made any errors. There are three parts to checking:

  1. Is the XML well-formed?
  2. Is it valid against the catalog schema?
  3. Does it have everything it needs to be read by a THREDDS client?

You can use any THREDDS validation service, such as this one to check all three of these.

You can check well-formedness using an XML tool like XMLSpy; in order to check validity in those tools you will need to declare the catalog schema location like this:

<catalog name="MyName"
xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2 xsi:schemaLocation="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0 http://www.unidata.ucar.edu/schemas/thredds/InvCatalog.1.0.xsd">
... </catalog>
  1. This line declares the schema-instance namespace. Just copy it exactly as you see it here.
  2. This line tells your XML validation tool where to find the thredds schema. Just copy it exactly as you see it here.

The THREDDS validation service, as well as the catalog library, knows where the schemas are located, so you only need to add these 2 lines if you want to do your own validation.