TDS Configuration CatalogsTHREDDS catalogs were originally designed as simple catalogs of
remote datasets. They associated human-readable names with data access
URLs and allowed both a hierarchical organization and the addition of
metadata. Thus providing client applications with information for
accessing remote datasets (as we saw earlier with the ToolsUI and IDV
applications). [More information is available from the THREDDS catalog primer and specification document.]
In this section, we will take a look at the extensions to THREDDS
catalogs that allow the TDS to use them for configuration. We call
catalogs that use these extensions TDS Configuration Catalogs or Server-side Catalogs.
They represent the top-level catalogs the TDS will serve, contain
information detailing the datasets the TDS will serve, and indicate
which services will be available for each dataset. All the
configuration information is only needed by the server and is removed
or transformed for the client view of the catalog.
In a client-side catalog, an access URL can be constructed for a dataset if the dataset: 1) references a service element, and 2) has a urlPath attribute or access child element. The service element provides a way to factor out access information from dataset elements.
To handle a data access request, the TDS needs enough configuration
information so that it can map an incoming request URL to a location on
local disk. In the TDS configuration files, the datasetRoot and datasetScan elements perform this function.
service ElementLooking at our main TDS config catalog, catalog.xml:
(1) <service name="thisDODS" serviceType="OpenDAP" base="/thredds/dodsC/" />
...
(2a) <dataset name="Test Single Dataset" ID="testDataset" serviceName="thisDODS"
(3a) urlPath="test/testData.nc" />
<datasetScan name="NCEP GFS models" ID="model/NCEP/GFS"
(3b) path="model/NCEP/GFS" location="/data/ldm/pub/native/grid/NCEP/GFS">
<metadata inherited="true">
(2b) <serviceName>thisDODS</serviceName>
</metadata>
...
</datasetScan>
The service is defined at (1) with the name "thisDODS". The service is referenced at (2a) and (2b) using the serviceName attribute and element, respectively. Notice that the reference at (2b) is in an inherited metadata element which means any descendant dataset
elements would also reference this service. The second part of making a
dataset accessible is to specify the dataset URL that gets appended to
the service base URL. In the above example, this is done at (3a) and
(3b). Though (3b) is in a server-side datasetScan element so gets expanded when catalog requests are made to the server.
datasetRoot ElementThe datasetRoot element provides a mapping between a base request URL and a data location that can be used with individual datasets.
Revisiting our current TDS configuration catalog:
<service name="thisDODS" serviceType="OpenDAP" base="/thredds/dodsC/" />
(1) <datasetRoot path="test" location="content/testdata/"/>
<dataset name="Test Single Dataset" ID="testDataset" serviceName="thisDODS"
(2) urlPath="test/testData.nc"/>
We can see that the dataset has an OPeNDAP access URL of /thredds/dodsC/test/testData.nc
constructed from the service base URL and the dataset url path. The
datasetRoot element defines the request URL segment ("test") that it is
associating with the a location on local disk ("content/testdata/"
which is a special shortcut to ${TOMCAT_HOME}/content/thredds/public/testdata/). The TDS knows that the dataset uses the URL/location association defined by this datasetRoot element because the urlPath of the dataset (2) starts with the path of the datasetRoot (1).
Looking at the client-side view of the catalog (http://localhost:8080/thredds/catalog.xml), notice that the datasetRoot element is not included.
Let's try an example
ls /datals /data/idvls /data/idv/trajectorycd ${TOMCAT_HOME}/content/threddsvi catalog.xmldatasetRoot to point to the /data/idv/trajectory directorydataset to reference the trajectory data file.http://localhost:8080/thredds/debugNote: Remember the value of the location attribute
must be an absolute path (except for the special case of the "content" shortcut).
datasetScan ElementThe datasetScan element provides a mapping between a
base request URL and a data location that must reference an entire
collections of datasets (i.e., for a local disk, the location must
reference a directory). In the client-view of the catalog, a datasetScan element is shown as a catalogRef
element. The generation of the catalog for the collection is actually
differed till a request is made for that catlaog. When the catalog is
requested the location directory is scanned, directories are
represented as catalogRef elements and files are represented as dataset elements. The scanning of each subdirectory is defered till a request is made for the corresponding catalog.
Again, back to our current TDS configuraiton catalog:
<service name="thisDODS" serviceType="OpenDAP" base="/thredds/dodsC/" />The
...
<datasetScan name="NCEP GFS models" ID="model/NCEP/GFS"
(1) path="model/NCEP/GFS" location="/data/ldm/pub/native/grid/NCEP/GFS">
<serviceName>thisDODS</serviceName>
...
</datasetScan>
path attribute on the datasetScan element
is the part of the URL that identifies this datasetScan and is used to
map data access URLs to a location on local disk. The location attribute on the datasetScan
element provides the location of the dataset collection on the local
file system (it must be a directory and should be an absolute path).catalogRef element that represents the data collection given by the location\http://localhost:8080/thredds/catalog/model/NCEP/GFS/catalog.xmlNow that we've seen the details of the resulting XML, let's look at the catalog structure generated:
ls /data/ldm/pub/native/grid/NCEP/GFSls /data/ldm/pub/native/grid/NCEP/GFS/*Note: Data root paths must be unique across a TDS. Because the TDS
uses the set of all given path values to map URLs to datasets, each path value MUST
be unique across all config catalogs on a given TDS installation. Duplicates will
cause warning messages in the catalogErrors.log file.
service ElementThe TDS provides the several data services including an OPeNDAP server, an HTTP bulk file download service, and a WCS service.
The URLs to access these services start with the TDS context name ("thredds") and the appropriate servelet name (e.g., "dodsC"). Because of this, the base attribute of the corresponding service elements must be exactly as follows:
OPeNDAP server:
<service name="ncdods" serviceType="OPeNDAP" base="/thredds/dodsC/" />
HTTP bulk file server :
<service name="fileServer" serviceType="HTTPServer" base="/thredds/fileServer/" />
WCS Server :
<service name="wcsServer" serviceType="WCS" base="/thredds/wcs/" />
You can use whatever name you choose for the service, they only needs to match the ones used in the dataset serviceName. Note that the base URLs are relative, so your catalogs are independent of your server hostname and port.
Datasets can be made available through more than one access method by defining and then referencing a compound service element. For instance:
<service name="multiService" serviceType="Compound" base="" >
<service name="thisDODS" serviceType="OpenDAP" base="/thredds/dodsC/" />
<service name="wcsServer" serviceType="WCS" base="/thredds/wcs/" />
</service>
defines a compound service named "multiService" which contains two
nested services. Any dataset that reference the compound service will
have two access methods. So the dataset:
<dataset name="cool data" serviceName="multiService" urlPath="so/cool/data.nc" />
would have these two access URLs:
/thredds/dodsC/so/cool/data.nc/thredds/wcs/so/cool/data.nccd ${TOMCAT_HOME}/content/thredds/vi catalog.xmlcatalogErrors.log
Note: In a given catalog, the names of service elements must be unique.
datasetScan ElementA datasetScan element can specify which files and directories it will include with a filter element (see spec
for more details). When no filter element is given, all files and
directories are included in the generated catalog(s). Adding a filter
element to your datasetScan element allows you to include (and/or exclude)
the files and directories as desired. We saw a simple example earlier when we configured our TDS to serve the GFS model data:
<filter>
<include wildcard="GFS*.grib1" />
</filter>
The include and exclude elements both determine which datasets they match on whether their wildcard pattern (given by the wildcard attribute) or regular expression (given by the regExp attribute) match the dataset name. By default, includes and excludes apply only to
regular files (atomic datasets). You can specify that they apply to
directories (collection datasets) as well by using the atomic and collection
attributes. For example, I can exclude all the GFS Alaska 191km model
data by adding the following exclude element to the above filter:
<exclude wildcard="Ensemble_1p25deg" atomic="false" collection="true" />
Let's try:
cd ${TOMCAT_HOME}/content/thredds/vi catalog.xmlexclude element to the existing filtercatalogErrors.logAll generated datasets are given an ID. The IDs are simply the path
of the dataset appended to the datasetScan path value or, if one
exists, the ID of the datasetScan element. So, for the GFS/Alaska_191km directory and our current configuration:
<datasetScan name="NCEP GFS models" ID="model/NCEP/GFS"
path="model/NCEP/GFS" location="/data/ldm/pub/native/grid/NCEP/GFS">
the value of the dataset ID would be "model/NCEP/GFS/Alaska_191km".
Let's try changing the ID for this dataset:
cd ${TOMCAT_HOME}/content/thredds/vi catalog.xmlID valuecatalogErrors.logBy default, all datasets are named with the corresponding file name. By adding a namer element, you can specify more human readable dataset names. The following namer looks for the dataset named "Alaska_191km" and renames it with the replace string:
<namer>
<regExpOnName regExp="Alaska_191km" replaceString="NCEP GFS Alaska 191km model data" />
</namer>
More complex renaming is possible as well. The namer uses a regular expression match on the dataset name. If the match succeeds, any regular expression capturing groups are used in the replacement string.
A capturing group is a part of a regular expression enclosed in parenthesis. When a regular expression with a capturing group is applied to a string, the substring that matches the capturing group is saved for later use. The captured strings can then be substituted into another string in place of capturing group references,"$n", where "n" is an integer indicating a particular capturing group. (The capturing groups are numbered according to the order in which they appear in the match string.) For example, the regular expression "Hi (.*), how are (.*)?" when applied to the string "Hi Fred, how are you?" would capture the strings "Fred" and "you". Following that with a capturing group replacement in the string "$2 are $1." would result in the string "you are Fred."
Here's an example namer:
<namer>
<regExpOnName regExp="GFS_Alaska_191km_([0-9]{4})([0-9]{2})([0-9]{2})_([0-9]{2})([0-9]{2})"
replaceString="NCEP GFS 191km Alaska $1-$2-$3 $4:$5:00 GMT"/>
</namer
the regular expression has five capturing groups
When applied to the dataset name
"GFS_Alaska_191km_20051011_0000.grib1", the
strings
"2005", "10", "11", "00", and "00" are captured. After replacing the
capturing group references in the replaceString attribute value, we get
the name "NCEP GFS 191km Alaska 2005-10-11 00:00:00 GMT".
To try it on our datasetScan element:
cd ${TOMCAT_HOME}/content/thredds/vi catalog.xmlnamer element to the datasetScan element.catalogErrors.loghttp://localhost:8080/thredds/catalog/model/NCEP/GFS/catalog.htmlYou could add namer elements for each subdirectory under GFS.
However, that setup would causes every namer to be tried on every
dataset under the GFS directory. One way to get around this problem
would be to split the datasets out with a datasetScan element for each subdirectory.
So, we can:
cd ${TOMCAT_HOME}/content/thredds/vi catalog.xmldatasetScan element.datasetScan elements to serve the "Alaska_191km" dataset.datasetScan element to exclude the "Alaska_191km" data.catalogErrors.loghttp://localhost:8080/thredds/catalog/model/NCEP/GFS/catalog.html
Note: Though the data root paths must be unique,
they can be extensions of an existing path. The TDS looks for the path
that has the longest match in a request URL.
A sort element can be added to a datasetScan to specify the order in which a collection of datasets are listed. Without a sort element, datasets at each collection level are listed
in their "natural" order. Currently, the only supported sort algorithm sorts datasets lexigraphically by name either in increasing or
decreasing order. Here's what a sort element looks like:
<sort>
<lexigraphicByName increasing="false" />
</sort>
Exercise:
cd ${TOMCAT_HOME}/content/thredds/vi catalog.xmlsort element to the "Alaska_191km" datasetScan element.catalogErrors.loghttp://localhost:8080/thredds/catalog/model/NCEP/GFS/catalog.html
addProxies element provides a place for
describing proxy datasets. Currently,
only two addProxies child elements are defined. They are both "Latest" proxy elements. The simpleLatest element adds a proxy dataset which proxies the existing dataset whose name is lexigraphically
greatest (which finds the latest dataset assuming a timestamp is part
of the dataset name). The latestComplete element behaves similarly to simpleLatest
except that the proxied dataset does not include any datasets that have
been modified more recently than a given time limit, e.g., you could specify
you want the most recent (lexigraphically) dataset that hasn't been
modified for 60 minutes. Both the simpleLatest and latestComplete
elements must point to an existing service element.To add a "Latest" dataset to our "Alaska_191km" dataset, we could add:
<service name="latest" type="Resolver" base="" />
to our catalog and
<addProxies>
<latestComplete name="latestComplete.xml" top="true" serviceName="latest" lastModifiedLimit="60" />
</addProxies>
to our "Alaska_191km" datasetScan element. This would result in the following dataset being at the top of the "Alaska_191km" collection of datasets:
<dataset name="latestComplete.xml" serviceName="latest" urlPath="latestComplete.xml" />
The latestComplete element includes a name attribute which provides the name of the proxy dataset, the serviceName attribute that references the service used by the proxy dataset, the top
attribute which indicates if the proxy dataset should appear at the top
or bottom of the list of datasets in this collection, and the lastModifiedLimit which feeds into the algorithm which determines which dataset is being proxied.
The simpleLatest element allows for the same attributes as the latestComplete element minus the lastModifiedLimit attribute. In this case, all the attributes have default values: the name attribute defaults to "latest.xml", the top attribute defaults to "true", and the serviceName attribute defaults to "latest".
The addDatasetSize element indicates that file size metadata should be added to all atomic datasets. Adding
<addDatasetSize />
to a datasetScan element results in the addition of a dataSize element to each atomic dataset:
<dataSize units="Kbytes">6.08</dataSize>
timeCoverage Elements
A datasetScan element may contain an addTimeCoverage element. The
addTimeCoverage element indicates that a timeCoverage metadata element
should be added to each dataset in the collection and describes
how to determine the time coverage for each datasets in the collection.
Currently, the addTimeCoverage element can only construct
start/duration timeCoverage elements and uses the dataset name to determine the start time. As described in the "Naming Datasets" section above, the addTimeCoverage element applies a regular expression match to the dataset name. If the match succeeds, any regular expression capturing groups are used in the start time replacement string to build the start time string.These attributes values are used to
determine the time coverage:
datasetNameMatchPattern attribute value
is used for a regular expression match on the dataset name. If a match is found, a timeCoverage element is added to the dataset. The
match pattern should include capturing groups which allow the match to save substrings from the dataset name.startTimeSubstitutionPattern attribute
value has all capture group references ("$n") replaced by the
corresponding substring that was captured during the match. The resulting string is used as the start value of the resulting timeCoverage element.duration attribute value is used as the duration value of the resulting timeCoverage element.Adding
<addTimeCoverage datasetNameMatchPattern="([0-9]{4})([0-9]{2})([0-9]{2})_([0-9]{2})([0-9]{2}).grib1$"
startTimeSubstitutionPattern="$1-$2-$3T$4:00:00"
duration="60 hours" />
to a datasetScan element results in the following timeCoverage element:
<timeCoverage>
<start>2005-07-18T12:00:00</start>
<duration>60 hours</duration>
</timeCoverage>
This document is maintained by Unidata and was last updated on July 20, 2007. Send comments to THREDDS support.