THREDDS Catalog Primer


THREDDS Servers in general, and the TDS in particular, communicate to clients by sending them a THREDDS Catalog (aka Inventory Dataset Catalog) that describes what datasets the server has, and how they can be accessed. A catalog is an XML document that follows the THREDDS Catalog schema.

This primer will describe the client view of the catalog. If you are maintaining a TDS server, you will also need to add other information to the catalog, which is used only by the server and not normally seen by the client.

Introduction

Here's an example of a simple catalog:

 1) <?xml version="1.0" ?>
 2) <catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" >
 3)   <service name="dodsServer" serviceType="OpenDAP"  base="/thredds/dodsC/" />
 4)   <dataset name="SAGE III Ozone Loss for Oct 31 2006" serviceName="dodsServer" urlPath="sage/110312006.nc"/>
 5) </catalog>

with this line-by-line explanation:

  1. The first line indicates that its an XML document.
  2. This is the root element of the XML, the catalog element. It must declare the thredds catalog namespace with the xmlns attribute exactly as shown.
  3. This declares a service with name dodsServer. It is a OpenDAP server whose dataset URLs all start with /thredds/dodsC/. This is what is called a reletive URL, and is resolved against the catalog URL. If the catalog URL is, for example, http://motherlode.ucar.edu:9080/thredds/Sage/catalog.html, then the service base will resolve to http://motherlode.ucar.edu:9080/thredds/dodsC/.
  4. This declares a dataset whose name is SAGE III Ozone Loss for Oct 31 2006. It references the dodsServer service, and its access URL will be http://motherlode.ucar.edu:9080/thredds/dodsC/sage/010312006.nc.
  5. This closes the catalog element.

Nested datasets

When you have many datasets to declare in each catalog, use nested datasets:

 <?xml version="1.0" ?> 
 <catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" >
   <service name="dodsServer" serviceType="OpenDAP"  base="/thredds/dodsC/" />

1) <dataset name="SAGE III Ozone Loss Experiment" >
2)   <dataset name="January Averages" serviceName="dodsServer" urlPath="sage/avg/jan.nc"/>
2)   <dataset name="February Averages" serviceName="dodsServer" urlPath="sage/avg/feb.nc"/>
2)   <dataset name="March Averages" serviceName="dodsServer" urlPath="sage/avg/mar.nc"/>
3) </dataset>

 </catalog>
  1. This now declares a collection dataset which just acts as a container for the other datasets. Note that is ends in a > instead of />, and does not have a urlPath element.
  2. These are the datasets that directly point to data, called direct datasets.
  3. This closes the collection dataset element on line 1.

You can add any level of nesting you want, eg:

<?xml version="1.0" ?> 
<catalog name="Example" xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" >
 <service name="dodsServer" serviceType="OpenDAP"  base="/thredds/dodsC/" />

 <dataset name="SAGE III Ozone Loss Experiment" >

  <dataset name="Monthly Averages" >
   <dataset name="January Averages" serviceName="dodsServer" urlPath="sage/avg/jan.nc"/>
   <dataset name="February Averages" serviceName="dodsServer" urlPath="sage/avg/feb.nc"/>
   <dataset name="March Averages" serviceName="dodsServer" urlPath="sage/avg/mar.nc"/>
  </dataset>

  <dataset name="Daily Flight Data" >
   <dataset name="January">
     <dataset name="Jan 1, 2001" serviceName="dodsServer" urlPath="sage/daily/20010101.nc"/>
     <dataset name="Jan 2, 2001" serviceName="dodsServer" urlPath="sage/daily/20010201.nc"/>
   </dataset>
  </dataset>

 </dataset>
</catalog>

More dataset information

There's a lot of other information that can be optionally added that helps applications and digital libraries know how to "do the right thing" with the dataset. The collectionType attribute is used on collection datasets. The dataType is a simple classification (eg Image, Grid, Point data, etc). The dataFormatType describes what format the data is stored in (eg NetCDF, HDF5, etc) used by a file transfer protocol like FTP. The combination of the naming authority and the ID attribute should form a globally unique identifier for a dataset. In the TDS, it is especially important to add ID attributes to your datasets.

<dataset name="SAGE III Ozone Loss Experiment" collectionType="TimeSeries">
  <dataset name="January Averages" serviceName="aggServer" urlPath="sage/avg/jan.nc" authority="unidata.ucar.edu" ID="sage-20938483">
	 <dataType>Trajectory</dataType>
	 <dataFormatType>NetCDF</dataFormatType>
  </dataset>
</dataset>

The harvest attribute indicates that the dataset is at the right level of granularity to be exported to search systems like Digital Libraries. Elements such as summary, rights, publisher are needed in order to create valid entries for these services. For more details, see Exporting THREDDS Datasets to Digital Libraries. Also see the Catalog Specification as a complete reference.

<dataset name="SAGE III Ozone Loss Experiment" harvest="true">
  <contributor role="data manager">John Smith</contributor>
<keyword>Atmospheric Chemistry</keyword>
<publisher>
<name vocabulary="DIF">Community Data Portal, National Center for Atmospheric Research, University Corporation for Atmospheric Research</long_name> <contact url="http://dataportal.ucar.edu" email="cdp@ucar.edu"/>
</publisher>
</dataset>

Factoring out information

Rather than declare the same information on each dataset, you can use the metadata element to factor out common information.:

  <dataset name="SAGE III Ozone Loss Experiment" >

1) <metadata inherit="true">
2)  <serviceName>dodsServer</serviceName>
2)  <dataType>Trajectory</dataType>
2)  <dataFormatType>NetCDF</dataFormatType>
2)  <authority>unidata.ucar.edu</authority>
   </metadata>

3) <dataset name="January Averages" urlPath="sage/avg/jan.nc" ID="sage-23487382"/>
3) <dataset name="February Averages" urlPath="sage/avg/feb.nc" ID="sage-63656446"/>
4) <dataset name="Global Averages" urlPath="sage/global.nc" ID="sage-7869700g" dataType="Grid"/>

  </dataset>
  1. The metadata element with inherit=true implies that all the information inside the metadata element applies to the current dataset and all nested datasets.
  2. The serviceName, dataType, dataFormatType and authority are declared as elements.
  3. These datasets now use the serviceName, dataType, dataFormatType and authority values declared in the parent dataset.
  4. This dataset uses the serviceName, dataFormatType and authority values and overrides the dataType.

More Advanced Topics

XML Namespaces and Validation

If you use elements from other namespaces, you must declare those namespaces in the catalog element. Currently there are two other namespaces THREDDS libraries will recognize, Dublin Core, and XLink, whose namespaces look like:

<catalog name="MyName"
    xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" 
    xmlns:dc="http://purl.org/dc/elements/1.1/"  
    xmlns:xlink="http://www.w3.org/1999/xlink" >

Its not obvious, but namespaces are not web addresses, they are just strings that need to be copied exactly as you see them here.

As catalogs get more complicated, you should check that you haven't made any errors. There are three parts to checking:

  1. Is the XML well-formed?
  2. Is it valid against the catalog schema?
  3. Does it have everything it needs to be read by a THREDDS client?

You can use a THREDDS validation service, such as this one to check all three of these.

You can check well-formedness using an XML tool like XMLSpy. If you also want to check validity in those tools, you will need to declare the catalog schema location like this:

<catalog name="MyName"
  xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" 
1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
2 xsi:schemaLocation="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0 http://www.unidata.ucar.edu/schemas/thredds/InvCatalog.1.0.xsd"> 
...
</catalog>
  1. This line declares the schema-instance namespace. Just copy it exactly as you see it here.
  2. This line tells your XML validation tool where to find the THREDDS XML schema document. Just copy it exactly as you see it here.

The THREDDS validation service, as well as the catalog library, knows where the schemas are located, so you only need to add these 2 lines if you want to do your own validation.

You will want to study the annotated schema, and the schema document itself.

Catalog References

It can be useful to break up large catalogs into pieces in order to separately maintain each piece. One way to do this is to use build each piece as a separate and logically complete catalog, then create a master catalog using catalog references:

  <?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" name="Top Catalog" 1) xmlns:xlink="http://www.w3.org/1999/xlink">
2) <dataset name="Realtime data from IDD">
3) <catalogRef xlink:href="idd/models.xml" xlink:title="NCEP Model Data" name="" />
<catalogRef xlink:href="idd/radars.xml" xlink:title="NEXRAD Radar" name="" />
<catalogRef xlink:href="idd/obsData.xml" xlink:title="Station Data" name="" />
<catalogRef xlink:href="idd/satellite.xml" xlink:title="Satellite Data" name="" />
</dataset>
</catalog>
  1. Note that we must declare the xlink namespace in the catalog element.
  2. The collection (or container) dataset logically contains the catalogRefs, which are thought of as nested datasets whose contents are the contents of the external catalog.
  3. Here are several catalogRef elements, each with a link to an external catalog, using the xlink:href attribute. The xlink:title is used as the name of the dataset. We need a name attribute (in order to validate, for obscure reasons), but it is ignored. The xlink:href are reletive URLS and are resolved against the catalog URL. If the catalog URL is, for example, http://motherlode.ucar.edu:9080/thredds/data/catalog.html, then the resolved URL of the first catalogRef will be http://motherlode.ucar.edu:9080/thredds/data/idd/models.xml.

Using ToolsUI to look at your catalogs

The NetCDF Tools User Interface (aka ToolsUI) can read and display THREDDS catalogs. You can start it from the command line, or launch it from webstart. Use the THREDDS Tab, and click on the button to navigate to your local catalog file. The catalog will be displayed in a tree widget on the left, and the selected dataset will be shown on the right, for example:

Once you get your catalog working in a TDS, you can enter the TDS URL directly, and view the datasets with the Open buttons.



THREDDSThis document is maintained by Unidata staff. Please send comments to THREDDS support.