Configuring TDS with the FeatureCollection elementThe featureCollection element is a way to tell the TDS to serve collections of CDM Feature Datasets. Currently this is used for gridded and point datasets whose time and spatial coordinates are recognized by the CDM software stack. This allows the TDS to automatically create logical datasets composed of collections of files, and to allow subsetting in coordinate space on them, eg through the WMS, WCS, and Netcdf Subsetting Service.
The featureCollection element is new in TDS 4.2 and replaces the fmrcDataset element in earlier versions. TDS 4.2 allows featureType = FMRC, Point, and Station. TDS 4.3 allows featureType = GRIB, which can only be used for collections of GRIB2 files.
A fair amount of the complexity of feature collections is managing the collection of files on the server, both in creating indexes for performance, and in managing collections that change. For high-performance servers, its better to let a background process manage indexing, and the THREDDS Data Manager (TDM) is an experimental application for this purpose available in TDS 4.3.
This document gives an overview of Feature Collections, as well as a complete syntax of allowed elements. For featureType specific information, see:
Simple case using defaults:
<featureCollection name="NCEP-NAM-Polar_90km" featureType="FMRC" path="fmrc/NCEP/NAM/Polar_90km"> <collection spec="/data/ldm/pub/native/grid/NCEP/NAM/Polar_90km/NAM_Polar_90km_#yyyyMMdd_HHmm#.grib2$"/> </featureCollection>
Fully specify the options:
<featureCollection name="NCEP-NAM-Polar_90km" featureType="FMRC" harvest="true" path="fmrc/NCEP/NAM/Polar_90km">
<collection spec="/data/ldm/pub/native/grid/NCEP/NAM/Polar_90km/NAM_Polar_90km_#yyyyMMdd_HHmm#.grib2$"
recheckAfter="15 min" olderThan="5 min"/>
<update startup="true" rescan="0 5 3 * * ? *" />
<protoDataset choice="Penultimate" change="0 2 3 * * ? *" />
<fmrcConfig regularize="true" datasetTypes="TwoD Best Files Runs ConstantForecasts ConstantOffsets" />
</featureCollection>
With NcML elements:
<featureCollection name="NCEP-NAM-Polar_90km" featureType="FMRC" harvest="true" path="fmrc/NCEP/NAM/Polar_90km">
<collection spec="/data/ldm/pub/native/grid/NCEP/NAM/Polar_90km/NAM_Polar_90km_#yyyyMMdd_HHmm#.grib2$"
recheckAfter="15 min" olderThan="5 min"/>
<update startup="true" rescan="0 5 3 * * ? *" />
<protoDataset choice="Penultimate" change="0 2 3 * * ? *" >
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<attribute name="History" value="processed by Rectilyser 6.23a"/>
</netcdf>
</protoDataset>
<fmrcConfig regularize="true" datasetTypes="TwoD Best Files Runs ConstantForecasts ConstantOffsets" />
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<attribute name="Conventions" value="CF-1.6"/>
</netcdf>
</featureCollection>
A featureCollection is a kind of dataset element, and so can contain the same elements and attributes of that element. Following is the XML Schema definition, which shows only the elements and attributes that are particular to a featureCollection:
<xsd:element name="featureCollection" substitutionGroup="dataset">Here is an example featureCollection as you might put it into a TDS catalog:
<xsd:complexType>
<xsd:complexContent>
<xsd:extension base="DatasetType">
<xsd:sequence>
<xsd:element type="collectionType" name="collection"/>
<xsd:element type="updateType" name="update" minOccurs="0"/>
<xsd:element type="manageType" name="manage" minOccurs="0"/>
<xsd:element type="protoDatasetType" name="protoDataset" minOccurs="0"/> <xsd:element type="fmrcConfigType" name="fmrcConfig" minOccurs="0"/>
<xsd:element type="pointConfigType" name="pointConfig" minOccurs="0"/>
<xsd:element type="gribConfigType" name="gribConfig" minOccurs="0"/>
<xsd:element ref="ncml:netcdf" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="featureType" type="featureTypeChoice" use="required"/>
<xsd:attribute name="path" type="xsd:string" use="required"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
</xsd:element>
1)<featureCollection name="Metar Station Data" harvest="true" featureType="Station" path="nws/metar/ncdecoded">
2) <metadata inherited="true">
<serviceName>fullServices</serviceName>
<documentation type="summary">Metars: hourly surface weather observations</documentation>
<documentation xlink:href="http://metar.noaa.gov/" xlink:title="NWS/NOAA information"/>
<keyword>metar</keyword>
<keyword>surface observations</keyword>
</metadata>
3) <collection spec="/data/ldm/pub/decoded/netcdf/surface/metar/Surface_METAR_#yyyyMMdd_HHmm#.nc$" />
4) <update startup="true" rescan="0 0/15 * * * ? *" trigger="allow"/>
5) <protoDataset choice="Penultimate" />
6) <pointConfig datasetTypes="cdmrFeature Files"/>
7) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<attribute name="Conventions" value="CF-1.6"/>
</netcdf>
</featureCollection>
A collection element defines the collection of datasets. Takes the place of NcML aggregation element (scan and scanFmrc).
<collection spec="/data/ldm/pub/native/satellite/3.9/WEST-CONUS_4km/WEST-CONUS_4km_3.9_#yyyyMMdd_HHmm#.gini$"
name="WEST-CONUS_4km" olderThan="1 min" olderThan="15 min" />
The XML Schema:
<xsd:complexType name="collectionType">
1) <xsd:attribute name="spec" type="xsd:string" use="required"/>
2) <xsd:attribute name="name" type="xsd:token"/>
3) <xsd:attribute name="olderThan" type="xsd:string" />
4) <xsd:attribute name="recheckAfter" type="xsd:string" />
5) <xsd:attribute name="dateFormatMark" type="xsd:string"/>
6) <xsd:attribute name="timePartition" type="xsd:string"/>
</xsd:complexType>
where
Feature Collections need to know how to sort the collection of files, so its recommended that you have a date in the filename, and to specify a date extractor in the specification string or include a dateFormatMark attribute. Otherwise, files will be sorted by filename.
Provides control over the choice of the prototype dataset for the collection. The protype dataset is used to populate the metadata for the feature collection.
<protoDataset choice="Penultimate" param="0" change="expr"> <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<attribute name="CF:FeatureType" value="station"/>
</netcdf> </protoDataset>
<xsd:complexType name="protoDatasetType"> <xsd:sequence> 1) <xsd:element ref="ncml:netcdf" minOccurs="0"/> </xsd:sequence> 2) <xsd:attribute name="choice" type="protoChoices"/> 3) <xsd:attribute name="change" type="xsd:string"/> 4) <xsd:attribute name="param" type="xsd:string"/> </xsd:complexType>
<xsd:simpleType name="protoChoices">
<xsd:union memberTypes="xsd:token">
<xsd:simpleType>
<xsd:restriction base="xsd:token">
<xsd:enumeration value="First"/>
<xsd:enumeration value="Random"/>
<xsd:enumeration value="Penultimate"/>
<xsd:enumeration value="Latest"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:union>
</xsd:simpleType>
where:
The choice of the protoDataset matters when the datasets are not homogenous:
For collections that change, the update element provides options to update the collection in a background task. New collections are built in the background, so that requests do not wait.
<update startup="true" rescan="cron expr" trigger="allow" />The XML Schema definition for the update element:
<xsd:complexType name="updateType">
1) <xsd:attribute name="startup" type="xsd:boolean"/>
2) <xsd:attribute name="rescan" type="xsd:token"/>
3) <xsd:attribute name="trigger" type="xsd:token"/>
</xsd:complexType>
where:
This instructs the TDS to manage your collection by deleting files that are older than a certain time.
<manage deleteAfter="30 days" check="cron expr" />
where:
- deleteAfter= delete files older than this amount
- check= "cron expr" uses a cron expression to specify when the collection should be checked for old files.
There are two way to update a feature collection when it changes, without having to restart the TDS:
If you have a collection that doesnt change, do not use the recheckAfter or the rescan atribute. Instead, use:
<update startup ="nocheck" />
which assumes that the collection has not changed since the last time the TDS was run. This saves a lot of processing on large collections that you know dont change.
If you want the collection to be tested at startup to see if it has changed since the last time the TDS was run, use:
<update startup ="true" />
Otherwise the collection will be checked for changes and created when the first request for it comes in.
For collections that change but are rarely used, use the recheckAfter attribute on the collection element. This minimizes unneeded processing for lightly used collections. This is also a good strategy for small collections which don't take very long to build.
When you want to ensure that requests are answered as quick as possible, update the collection in the background using the rescan attribute of the update element.
To externally control when a collection is updated, use:
<update trigger ="allow" />
You must enable remote management. When the dataset changes, send a message to a special URL in the TDS.
NcML is no longer used to define the collection, but it may still be used to modify the feature collection dataset.
Old way:
<datasetFmrc name="RTOFS Forecast Model Run Collection" path="fmrc/rtofs">
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
1) <variable name="mixed_layer_depth">
<attribute name="long_name" value="mixed_layer_depth @ surface"/>
<attribute name="units" value="m"/>
</variable>
<aggregation dimName="runtime" type="forecastModelRunSingleCollection" timeUnitsChange="true" recheckEvery="10 min">
2) <variable name="time">
<attribute name="units" value="hours since "/>
</variable>
3) <scanFmrc location="c:/rps/cf/rtofs" regExp=".*ofs_atl.*\.grib2$"
runDateMatcher="#ofs.#yyyyMMdd" forecastOffsetMatcher="HHH#.grb.grib2#" subdirs="true"
olderThan="10 min"/>
</aggregation>
</netcdf>
</datasetFmrc>
where:
New way:
<featureCollection name="RTOFS Forecast Model Run Collection" path="fmrc/rtofs">
1) <collection spec="c:/rps/cf/rtofs/.*ofs_atl.*\.grib2$" recheckAfter="10 min" olderThan="5 min"/>
2) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<variable name="time">
<attribute name="units" value="hours since "/>
</variable>
</netcdf>
<protoDataset>
3) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<variable name="mixed_layer_depth">
<attribute name="long_name" value="mixed_layer_depth @ surface"/>
<attribute name="units" value="m"/>
</variable>
</netcdf>
</protoDataset>
</featureCollection>
where:
This document is maintained by John Caron and was last updated June 2011