service Elementservice Elements - Serving Datasets with Multiple MethodsDatasets can be made available through more than one access method by defining and then
referencing a compound service element. The following:
<service name="all" serviceType="Compound" base="" >
<service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
<service name="wcs" serviceType="WCS" base="/thredds/wcs/" />
</service>
defines a compound service named "all" which contains two nested services. Any dataset that reference the compound service will have two access methods. For instance:
<dataset name="cool data" urlPath="so/cool/data.nc" >
<serviceName>all</serviceName>
</dataset>
would result in these two access URLs:
/thredds/dodsC/so/cool/data.nc
/thredds/wcs/so/cool/data.nc
Note: The contained services can still be referenced independently. For instance:
<dataset name="more cool data" urlPath="more/cool/data.nc" >
<serviceName>odap</serviceName>
</dataset>
results in a single access URL:
/thredds/dodsC/more/cool/data.nc
service Element in a CatalogWithin a catalog, the service name is used to reference a service
element. The service names must therefore be unique in each catalog.
[Note: It is not necessary that they be unique globally within a TDS.
Only on a catalog by catalog basis.]
<service name="all" serviceType="Compound" base="" >
<service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
<service name="http" serviceType="HTTPServer" base="/thredds/fileServer/" />
</service>
<service name="grid" serviceType="Compound" base="" >
<service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
<service name="wcs" serviceType="WCS" base="/thredds/wcs/" />
<service name="wms" serviceType="WMS" base="/thredds/wms/" />
<service name="http" serviceType="HTTPServer" base="/thredds/fileServer/" />
</service>
<metadata xlink:title="some good metadata" xlink:href="http://my.server/md/data1.xml" />
<description xlink:title="My Data" xlink:href="http://my.server/md/data1.html" />
...
<dataset name="TDS Tutorial: example inherited metadata">
<metadata inherited="true">
<serviceName>odap</serviceName>
<description>Really great data.</description>
<keyword>Ocean</keyword>
<keyword>Temperature</keyword>
<creator>Ethan</creator>
<publisher>Ethan</publisher>
<date type="created">2008-10-30T14:22</date>
<dataFormat>netCDF</dateFormat>
</metadata>
<dataset name="TDS Tutorial: example data 1" urlPath="test/example1.nc" />
<dataset name="TDS Tutorial: example data 2" urlPath="test/example2.nc" />
<dataset name="TDS Tutorial: example data 3" urlPath="test/example3.nc" />
<dataset name="TDS Tutorial: example data 4" urlPath="test/example4.grib2">
<dataFormat>GRIB-2</dataFormat>
</dataset>
</dataset>
...
Notes:
service ElementsThe TDS provides data access services at predefined URL base paths. Therefore, service base URLs must match the following values:
<service name="odap" serviceType="OPeNDAP" base="/thredds/dodsC/" />
<service name="ncss" serviceType="NetcdfSubset" base="/thredds/ncss/" />
<service name="wcs" serviceType="WCS" base="/thredds/wcs/" />
<service name="wms" serviceType="WMS" base="/thredds/wms/" />
<service name="fileServer" serviceType="HTTPServer" base="/thredds/fileServer/" />
You can check that a data file is recognized as "gridded" with netCDF-Java ToolsUI. (ToolsUI can be found on the netCDF-Java home page.)
The datasetScan element is an extension of the dataset element and so can contain metadata.
...
<service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
2) <datasetScan name="Test all files in a directory" ID="testDatasetScan"
path="my/test/all" location="/my/data/testdata">
<metadata inherited="true">
<serviceName>odap</serviceName>
<keyword>Ocean</keyword>
<keyword>Temperature</keyword>
<creator>Ethan</creator>
<publisher>Ethan</publisher>
<date type="created">2008-10-30T14:22</date>
</metadata>
</datasetScan>
...
All generated catalogs that are descendants of this datasetScan will contain all inherit metadata contained in the datasetScan element. For instance, here is a resulting child catalog:
...
<service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
<dataset name="Test all files in a directory" ID="testDatasetScan" >
<metadata inherited="true">
<serviceName>odap</serviceName>
<keyword>Ocean</keyword>
<keyword>Temperature</keyword>
<creator>Ethan</creator>
<publisher>Ethan</publisher>
<date type="created">2008-10-30T14:22</date>
</metadata>
<dataset name="afile.nc" ID="testDatasetScan/afile.nc" urlPath="my/test/all/afile.nc">
<dataset name="testData.nc" ID="testDatasetScan/afile.nc" urlPath="my/test/all/testData.nc">
<dataset name="junk.nc" ID="testDatasetScan/afile.nc" urlPath="my/test/all/junk.nc">
<catalogRef xlink:title="grib" ID="testDatasetScan/grib" name=""
xlink:href="/thredds/catalog/my/test/all/grib/catalog.xml" />
</dataset>
...
At startup, the TDS reads the root catalog
${TOMCAT_HOME}/content/thredds/catalog.xml
and recursively all configuration catalogs that are linked to it through a relative
catalogRef element . The resulting tree of catalogs are used as the
top-level catalogs served by the TDS. In the case of our distributed root catalog,
the tree looks like:
catalog.xml
|
|-- enhancedCatalog.xml
The tree of configuration catalogs can be as deeply nested as desired.
Additional root configuration catalogs can be defined in
${TOMCAT_HOME}/content/thredds/threddsConfig.xml
file. For instance, to add a test catalog add the following line:
<catalogRoot>myTestCatalog.xml</catalogRoot>
Each additional root configuration catalog can be the root of another tree of configuration catalogs.
First, the TDS catalog errors log
${TOMCAT_HOME}/content/thredds/logs/catalogErrors.log
contains all warning and error messages from parsing the configuration catalogs. As such, it is a great place to look for information if you run into problems with your TDS configuration catalogs
Second, the TDS Remote Management page provides access to a list of all the configuration catalogs the TDS has successfully read:
datasetRoot and datasetScan ElementsYou can have as many datasetRoot and datasetScan elements as you want, for example
<datasetRoot path="model" location="/data/ncep" /> <datasetRoot path="obs" location="/data/raw/metars" /> <datasetRoot path="cases/001" location="C:/casestudy/data/001" /> <datasetScan path="myData" location="/data/ncep/run0023" name="NCEP/RUN 23" serviceName="myserver" /> <datasetScan path="myData/gfs" location="/pub/ldm/gfs" name="NCEP/GFS" serviceName="myserver" />
The datasetRoot and datasetScan are said to define a data root.
path for
all catalogs used by the TDS.
Note: Because the TDS uses the set of all given path values
to map URLs to datasets, each path value MUST be unique across all
config catalogs on a given TDS installation. Duplicates will cause
warning messages in the catalogErrors.log file.
For example, using the above data roots, the following matches would be made:
| urlPath | file |
|---|---|
model/run0023/mydata.nc |
/data/ncep/run0023/mydata.nc |
obs/test.nc |
/data/raw/metars/test.nc |
myData/mydata.nc |
/data/ncep/run0023/mydata.nc |
myData/gfs/mydata.nc |
/pub/ldm/gfs/mydata.nc |
cases/001/test/area/two |
C:/casestudy/data/001/test/area/two |
The structure of a full OPeNDAP URL for the first urlPath above would look like:
http://hostname:port/thredds/dodsC/model/run0023/mydata.nc
|<--- server --->|<----->|<--->|<--->|<- filename ->|
| | |
webapp name -| | |- data root
|
service -|
The TDS Remote Management page has a link to list all known dataset roots:
[thredds@workshop00 ~]$ ls /data/ldm
bufr dusk dusk.080527 fsl ldm.pq ltng mcidas ngrid nogaps rcm severe surface wseta
cosmic dusk.080522 forecasts gempak logs madis nam_12km nldn rawfiles rtmodel suomi upperair
[thredds@workshop00 ~]$ ls /data/ldm/fsl
01hr 06min RASS
[thredds@workshop00 ~]$ ls /data/ldm/fsl/01hr
20082962000.nc 20082981400.nc 20083000800.nc 20083020200.nc 20083032000.nc 20083051400.nc 20083071000.nc 20083090400.nc
...
20082981200.nc 20083000600.nc 20083020000.nc 20083031800.nc 20083051200.nc 20083070800.nc 20083090200.nc 20083102100.nc
[thredds@workshop00 ~]$ ls /data/ldm/madis
20081022_0700.nc 20081024_1000.nc 20081026_1300.nc 20081028_1600.nc 20081030_1900.nc 20081101_2200.nc 20081104_0100.nc
...
20081024_0300.nc 20081026_0600.nc 20081028_0900.nc 20081030_1200.nc 20081101_1500.nc 20081103_1800.nc 20081105_2100.nc
[thredds@workshop00 ~]$ ls /data/ldm/suomi
CsuPWVh_2008.308.18.00.0060_nc CsuPWVh_2008.309.07.00.0060_nc CsuPWVh_2008.309.20.00.0060_nc CsuPWVh_2008.310.09.00.0060_nc
...
CsuPWVh_2008.309.04.00.0060_nc CsuPWVh_2008.309.17.00.0060_nc CsuPWVh_2008.310.06.00.0060_nc CsuPWVh_2008.310.19.00.0060_nc
[thredds@workshop00 ~]$ cd ${TOMCAT_HOME}/content/thredds
[thredds@workshop00 ~]$ vi catalog.xml // Use the editor of your choice
datasetScan element for the FSL data:
<datasetScan name="FSL" ID="FSL"
path="fsl" location="/data/ldm/fsl">
<metadata inherited="true">
<serviceName>thisDODS</serviceName>
</metadata>
</datasetScan>
http://localhost:8080/thredds/admin/debug
datasetScan elements are working:
http://localhost:8080/thredds/catalog.html
http://localhost:8080/thredds/admin/debug
datasetScan element so that the value of the
path attribute matches the one for the NAM_12km datasetScan
element.
[thredds@workshop00 ~]$ cd ${TOMCAT_HOME}/content/thredds
[thredds@workshop00 ~]$ vi catalog.xml // Use the editor of your choice
datasetScan ElementA datasetScan element can specify
which files and directories it will include with a filter
element (see spec
for more details). When no filter element is
given, all files and
directories are included in the generated catalog(s). Adding a filter
element to your datasetScan element allows
you to include (and/or exclude)
the files and directories as desired. The datasetScan element for the NAM_12km example included
the
following:
<filter>
<include wildcard="*.grib2" />
</filter>
To exclude the analysis data, the filter could be modified to:
<filter>
<include wildcard="*.grib2" />
<exclude wildcard="*f000.grib2" />
</filter>
The include and exclude
elements both determine which datasets they match on whether their
wildcard pattern (given by the wildcard
attribute) or regular
expression (given by the regExp
attribute) match the dataset name. By default, includes and excludes
apply only to
regular files (atomic datasets). You can specify that they apply to
directories (collection datasets) as well by using the atomic
and collection
attributes. For example, if the nam_12km directory contained a badData
directory, I could exclude it by adding the following to the filter:
<exclude wildcard="badData" atomic="false" collection="true" />
Error {
code = 500;
message = "Cant read /data/ldm/madis/.scour*: not a valid NetCDF file.";
};
filter element to the datasetScan
elements. Something like:
<filter>
<include wildcard="*.nc" />
<include wildcard="*.grib1" />
<include wildcard="*.grib2" />
</filter>
filter element to the "FSL" datasetScan
element to exclude the "06min" directories. Something like:
<exclude wildcard="06min" atomic="false" collection="true" />
All generated datasets are given an ID. The IDs are simply the
path
of the dataset appended to the datasetScan path value or, if one
exists, the ID of the datasetScan element. So, for the nam_12km
directory and our current configuration:
<datasetScan name="NCEP NAM 12km" ID="NAM_12km"
path="nam_12km" location="/data/ldm/nam_12km">
and the data file 2008110406f018.grib2, the value of the dataset ID would be "NAM_12km/2008110406f018.grib2".
By default, all datasets are named with the corresponding file name. By adding a namer element, you can specify more human readable dataset names. The following namer looks for the dataset named "NAM_12km" and renames it with the replace string:
<namer>
<regExpOnName regExp="NCEP NAM 12km" replaceString="NCEP NAM 12km model data" />
</namer>
More complex renaming is possible as well. The namer uses a regular
expression match on the dataset name. If the match succeeds,
any regular expression capturing
groups are used in the replacement string.
A capturing group is a part of a regular expression enclosed in parenthesis. When a regular expression with a capturing group is applied to a string, the substring that matches the capturing group is saved for later use. The captured strings can then be substituted into another string in place of capturing group references,"$n", where "n" is an integer indicating a particular capturing group. (The capturing groups are numbered according to the order in which they appear in the match string.) For example, the regular expression "Hi (.*), how are (.*)?" when applied to the string "Hi Fred, how are you?" would capture the strings "Fred" and "you". Following that with a capturing group replacement in the string "$2 are $1." would result in the string "you are Fred."
Here's an example namer:
<namer>
<regExpOnName regExp="([0-9]{4})([0-9]{2})([0-9]{2})([0-9]{2})f([0-9]{3}).grib2"
replaceString="NCEP NAM 12km $1-$2-$3 $4 GMT - Forecast hour: $5"/>
</namer>
the regular expression has five capturing groups
When applied to the dataset name
"2008110406f018.grib2", the
strings
"2008", "11", "04", "06", and "018" are captured. After replacing the
capturing group references in the replaceString
attribute value, we get
the name "NCEP NAM 12km 2008-11-04 06 GMT - Forecast hour: 018".
namer element to the Suomi datasetScan
element that extracts the date/time from the file name and uses the
date/time in generating a new name (similar to above)
the value of the
path attribute matches the one for the NAM_12km datasetScan
element.
A sort element can be added to a datasetScan
to specify the order in which a collection of datasets are listed.
Without a sort element, datasets at each
collection level are listed
in their "natural" order. Currently, the only supported sort algorithm
sorts datasets lexigraphically by name either in increasing
or
decreasing order. Here's what a sort element
looks like:
<sort>
<lexigraphicByName increasing="false" />
</sort>
With a real-time archive, it is convenient to define a "proxy" dataset
that always points to the most recent dataset in a collection. Other
types of proxy datasets may be useful as well and the addProxies
element provides a place for
describing proxy datasets. Currently,
only two addProxies child elements are
defined. They are both "Latest" proxy elements.
The simpleLatest element adds a proxy dataset
which proxies the existing dataset whose name is lexigraphically
greatest (which finds the latest dataset assuming a timestamp is part
of the dataset name). The latestComplete
element behaves similarly to simpleLatest
except that the proxied dataset does not include any datasets that have
been modified more recently than a given time limit, e.g., you could
specify
you want the most recent (lexigraphically) dataset that hasn't been
modified for 60 minutes. Both the simpleLatest
and latestComplete
elements must point to an existing service
element.
To add a "Latest" dataset to our "NAM_12km" dataset, we could add:
<service name="latest" type="Resolver" base="" />
to our catalog and
<addProxies>
<latestComplete name="latestComplete.xml" top="true" serviceName="latest" lastModifiedLimit="60" />
</addProxies>
to our "NAM_12km" datasetScan
element. This would result in the following dataset being at the top of
the "NAM_12km" collection of datasets:
<dataset name="latestComplete.xml" serviceName="latest" urlPath="latestComplete.xml" />
The latestComplete element includes
a name attribute which provides the name of
the proxy dataset, the serviceName attribute
that references the service used by the proxy dataset, the top
attribute which indicates if the proxy dataset should appear at the top
or bottom of the list of datasets in this collection, and the lastModifiedLimit
which feeds into the algorithm which determines which dataset is being
proxied.
The simpleLatest element allows for
the same attributes as the latestComplete
element minus the lastModifiedLimit
attribute. In this case, all the attributes have default values: the name
attribute defaults to "latest.xml", the top
attribute defaults to "true", and the serviceName
attribute defaults to "latest".
The addDatasetSize element indicates
that file size metadata should be added to all atomic datasets. Adding
<addDatasetSize />
to a datasetScan element results in
the addition of a dataSize element to each
atomic dataset:
<dataSize units="Kbytes">6.08</dataSize>
timeCoverage Elements
A datasetScan element may contain an addTimeCoverage
element. The
addTimeCoverage element indicates that a timeCoverage
metadata element
should be added to each dataset in the collection and describes
how to determine the time coverage for each datasets in the collection.
Currently, the addTimeCoverage
element can only construct
start/duration timeCoverage elements and uses
the dataset name to determine the start time. As described in the "Naming
Datasets" section above, the addTimeCoverage element applies
a regular
expression match to the dataset name. If the match succeeds,
any regular expression capturing
groups are used in the start time replacement string to build
the start time string. The values of the following attributes are used to
determine the time coverage:
datasetNameMatchPattern or the
datasetPathMatchPattern attribute gives a regular expression
used to match on the dataset name or path, respectively. If a match is
found, a timeCoverage element is added to the dataset.
The match pattern should include capturing groups which allow the match to save substrings from the
dataset name.
startTimeSubstitutionPattern
attribute value has all capture group references ("$n") replaced by the
corresponding substring that was captured during the match. The
resulting string is used as the start value of the resulting timeCoverage
element.
duration attribute value is
used as the duration value of the resulting timeCoverage
element.For instance, adding
<addTimeCoverage datasetNameMatchPattern="([0-9]{4})([0-9]{2})([0-9]{2})([0-9]{2})f[0-9]{3}.grib2"
startTimeSubstitutionPattern="$1-$2-$3T$4:00:00"
duration="60 hours" />
to a datasetScan element and given a data file named
2005071812f006.grib2
results in the following timeCoverage element:
<timeCoverage>
<start>2005-07-18T12:00:00</start>
<duration>60 hours</duration>
</timeCoverage>
addTimeCoverage element to the Suomi datasetScan
element that extracts the date/time from the file name and uses the
date/time to generate the timeCoverage element (similar to above).
This document is maintained by Unidata and was last updated
Send comments to THREDDS support.