Using NcML in TDS


An NcML document is an XML document that uses the NetCDF Markup Language to define a CDM dataset. NcML can be embedded directly into the TDS catalogs to achieve a number of powerful features, shown below. This embedded NcML is only useful in the TDS server catalogs, it is not meaningful to a THREDDS client, and so is not included in the client catalogs.

One can put an NcML element inside a dataset element, in which case it is a self-contained NcML dataset, or inside a datasetScan element, where it modifies a regular dataset. In both cases, we call the result a virtual dataset, and you cannot serve a virtual dataset with a file-serving protocol like FTP or HTTP. However, you can use subsetting services like OPeNDAP, WCS, WMS and NetcdfSubset.

Using NcML in a dataset element

NcML embedded in a TDS dataset element creates a self-contained NcML dataset. The TDS dataset does not refer to a data root, because the NcML contains its own location. The TDS dataset must have a unique URL path (this is true for all TDS datasets), but unlike a regular dataset, does not have to match a data root.

Modifying an existing dataset

You can use use NcML to modify an existing CDM dataset:

<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
         xmlns:xlink="http://www.w3.org/1999/xlink"
         name="TDS workshop test 1" version="1.0.2">
1)<service name="ncdods" serviceType="OPENDAP" base="/thredds/dodsC/"/>
2)<dataset name="Example NcML Modified" ID="ExampleNcML-Modified" urlPath="ExampleNcML/Modified.nc"> <serviceName>ncdods</serviceName> 3) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="/data/nc/example1.nc">
4) <variable name="Temperature" orgName="T"/> 5) <variable name="ReletiveHumidity" orgName="rh"> 6) <attribute name="long_name" value="relatively humid"/> <attribute name="units" value="percent (%)"/> 7) <remove type="attribute" name="description"/> </variable > </netcdf> </dataset> </catalog>
  1. A service is defined that allows the virtual dataset to be served through OPENDAP. Make sure that the base attribute is exactly as shown
  2. The virtual dataset is created and given a urlPath of "ExampleNcML/Modified.nc". The urlPath is essentially arbitrary, but must be unique within the TDS, and you should maintain a consistent naming convention to ensure uniqueness, especially for large collections of data. Its important to also give the dataset a unique ID.
  3. An NcML dataset is defined which references the netCDF file at the absolute location /data/nc/example1.nc. Note that you must declare the NcML namespace exactly as shown.
  4. The variable named T in the original file is renamed Temperature.
  5. The variable named rh in the original file is renamed ReletiveHumidity.
  6. Two attributes of rh are defined, long_name and units. If these already exist, they are replaced.
  7. The attribute of rh called description is removed.

See NcML Tutorial for more details.

Dataset vs virtual dataset

Lets look at serving a file directly vs serving it through NcML:

<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
         xmlns:xlink="http://www.w3.org/1999/xlink"
         name="TDS workshop test 2" version="1.0.2">

    
<service name="ncdods" serviceType="OPENDAP" base="/thredds/dodsC/"/> 1)<datasetRoot path="test/ExampleNcML" location="/data/nc/" /> 2)<dataset name="Example Dataset" ID="Example" urlPath="test/ExampleNcML/example1.nc"> <serviceName>ncdods</serviceName> </dataset>
3)<dataset name="Example NcML Modified" ID="Modified" urlPath="ExampleNcML/Modified.nc"> <serviceName>ncdods</serviceName> 4) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="/data/nc/example1.nc">
<variable name="Temperature" orgName="T"/> </netcdf> </dataset> </catalog>
  1. A datasetRoot is defined that associates URL path test/ExampleNcML with the disk location /data/nc/.
  2. The dataset is creeated with a urlPath of "test/ExampleNcML/example.nc". The first part of the path is matched to the datasetRoot, so that the full dataset location is /data/nc/example1.nc. This file is served directly by this dataset element.
  3. The same file is used in a virtual dataset defined by the embedded NcML. The virtual dataset is given an (arbitrary) urlPath of ExampleNcML/Modified.nc.
  4. The NcML element is defined which references the netCDF file at the absolute location /data/nc/example1.nc. The only modification is to rename the variable T to Temperature.

Using NcML aggregation

Here is an example that defines a dataset using NcML aggregation.

<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
         xmlns:xlink="http://www.w3.org/1999/xlink"
         name="TDS workshop test 3" version="1.0.2">

1)<service name="ncdods" serviceType="OPENDAP" base="/thredds/dodsC/" />
2)<dataset name="Example NcML Agg" ID="ExampleNcML-Agg" urlPath="ExampleNcML/Agg.nc">
3) <serviceName>ncdods</serviceName>
4) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
5) <aggregation dimName="time" type="joinExisting">
6) <scan location="/data/workshop/test/cg/" dateFormatMark="CG#yyyyDDD_HHmmss" suffix=".nc" subdirs="false"/>
</aggregation>
</netcdf>
</dataset> </catalog>
  1. An OPENDAP service is defined called ncdods.
  2. A THREDDS dataset is defined, which must have a urlPath that is unique within the TDS, in this case ExampleNcML/Agg.nc.
  3. The dataset uses the ncdods service.
  4. An NcML netcdf element is embedded inside the THREDDS dataset element.
  5. An NcML aggregation of type joinExisting is declared, using the existing time dimension as the aggregation dimension.
  6. All the files in the directory /data/workshop/test/cg/ that end with .nc will be scanned to create the aggregation. A dateFormatMark is used to define the time coordinates, indicating there is exactly one time coordinate in each file.

See NcML Aggregation for more details.

Using NcML in a datasetScan element

If an NcML element is added to a DatasetScan, it will modify all of the datasets contained within the DatasetScan. It is not self-contained, however, since it gets its location from the datasets that are dynamically scanned.

(1)<datasetScan name="Ocean Satellite Data" ID="ocean/sat" path="ocean/sat" location="R:/tds/netcdf/">
    <filter> 
      <include wildcard="*.nc" />
    </filter>
(2) <metadata inherited="true">
<serviceName>ncdods</serviceName>
<dataType>Grid</dataType>
</metadata>
(3) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <attribute name="Conventions" value="CF-1.0"/> </netcdf>
</datasetScan>
  1. A datasetScan element is created whose contained datasets start with URL path ocean/sat, and whose contents are all the files in the directory R:/testdata/tds/netcdf/ which end in .nc.
  2. All contained datasets inherit metadata indicating they use the ncdods service and are of type Grid.
  3. All contained datasets are wrapped by this NcML element. In this case, each dataset has the global attribute :Conventions="CF-1.0" added to it. Note that there is no location attribute, which is implicitly supplied by the datasets found by the datasetScan.

DatasetScan vs Aggregation Scan

The scan element in the NcML aggregation is similar in purpose to the datasetScan element, but be careful not to confuse the two. The datasetScan element is more powerful, and has more options for filtering etc. Its job is to create nested dataset elements inside the datasetScan, and so has various options to add information to those nested datasets. It has a generalized framework (CrawlableDataset) for crawling other things besides file directories. The scan element's job is to easily specify what files go into an NcML aggregation, and those individual files are hidden inside the aggregation dataset. It can only scan file directories. In the future, some of the capabilities of datasetScan will migrate into NcML scan.

Exercise: DatasetScan vs Aggregation Scan

Lets look at using a DatasetScan and an Aggregation scan on the same collection of files. Download catalogScan.xml, place it in your TDS content/thredds directory and add a catalogRef to it from your main catalog.

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
     xmlns:xlink="http://www.w3.org/1999/xlink"
     name="TDS workshop test 4" version="1.0.2">
     
   <service name="ncdods" serviceType="OPENDAP" base="/thredds/dodsC/"/>
1) <dataset name="Example NcML Agg" ID="ExampleNcML-Agg" urlPath="ExampleNcML/Agg.nc">
     <serviceName>ncdods</serviceName>
2)   <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
       <aggregation dimName="time" type="joinExisting" recheckEvery="4 sec">
         <scan location="/workshop/test/cg/" dateFormatMark="CG#yyyyDDD_HHmmss" suffix=".nc" subdirs="false"/>
       </aggregation>
     </netcdf>
   </dataset>

3) <datasetScan name="CG Data" ID="cg/files" path="cg/files" location="/workshop/test/cg/">
     <metadata inherited="true">
       <serviceName>ncdods</serviceName>
       <dataType>Grid</dataType>
     </metadata>
     <filter>
4)     <include wildcard="*.nc"/>
     </filter>
5)   <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
       <attribute name="Yoyo" value="Ma"/>
     </netcdf>
   </datasetScan>

</catalog>
  1. A virtual dataset is defined with URL ExampleNcML/Agg.nc
  2. The NcML aggregation for this dataset.
  3. A datasetScan element is created whose contained datasets start with URL path cg/files, and which scans the directory /workshop/test/cg/
  4. Only files which end in .nc.
  5. Add a global attribute to each file in the collection.

Start and restart your TDS and look at those datasets through the HTML interface and through ToolsUI.

Using the datasetFmrc element

A Forecast Model Run Collection is a collection of forecast model output. A special kind of aggregation, called an FMRC Aggregation, creates a dataset that has two time coordinates, called the run time and the forecast time. This dataset can then be sliced up in various ways to create 1D time views of the model output. A special kind of dataset, called a datasetFmrc makes these 1D datasets available though the TDS.

Download fmrc.ncml. and gfs.zip, unzip the latter and place in your data directory). In the ToolsUI NcML Tab, open fmrc.ncml and change the scan location to point to your data directory:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
   <aggregation dimName="run" type="forecastModelRunCollection" >
   <scan location="/data/working/fmrc/" suffix=".grib1"
     dateFormatMark="GFS_Puerto_Rico_191km_#yyyyMMdd_HHmm" />
   </aggregation>
</netcdf>

Now download catalogFmrc.xml, place it in your TDS content/thredds directory and add a catalogRef to it from your main catalog:

<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
            xmlns:xlink="http://www.w3.org/1999/xlink"
            name="THREDDS Data Server" version="1.0.1">
1)<service name="ncdods" serviceType="OPENDAP" base="/thredds/dodsC/" />
2)<datasetFmrc name="GFS_Puerto_Rico_191km" path="fmrc/NCEP/GFS/Puerto_Rico_191km" >
3) <metadata inherited="true">
<serviceName>ncdods</serviceName>
<dataFormat>GRIB-1</dataFormat>
<documentation type="summary">Specially good GFS_Puerto_Rico_191km</documentation>
</metadata>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
4) <aggregation dimName="run" type="forecastModelRunCollection">
5) <scan location="/workshop/test/gfs/" suffix=".grib1" 6) dateFormatMark="GFS_Puerto_Rico_191km_#yyyyMMdd_HHmm"/>
</aggregation>
</netcdf>
</datasetFmrc> </catalog>
  1. An OPENDAP service is defined called ncdods.
  2. A THREDDS datasetFmrc is defined, whose contained datasets will all have a path starting with fmrc/NCEP/GFS/Puerto_Rico_191km.
  3. All the metadata contained here will be inherited by the contained datasets.
  4. A forecastModelRunCollection aggregation is defined, with runTime dimension run.
  5. All the files in the directory /workshop/test/gfs/ that end with .grib1 will be scanned to create the aggregation.
  6. The runTime dates will be created from the filenames by matching on the dateFormatMark.

The contained datasets include the resulting 2D time dimension dataset, as well as the 1D time views of the ucar.nc2.dt.fmrc.ForecastModelRunCollection dataset described in section 4 above, as seen in the resulting HTML page for that dataset:

Fmrc Html

Also see: FMRC poster

Debugging NcML

When things go wrong, its best to first debug the aggregation outside of the TDS:

  1. Go to the TDS catalog and find the problem dataset. Inside the <dataset> element will be a <netcdf> element, that is the NcML aggregation. Extract it out and put it in a file called "test.ncml".
    1. Add the XML header to the top of it: <?xml version="1.0" encoding="UTF-8"?>
    2. Remove the recheckEvery attribute if present on the <scan> element.
    3. Make sure that the <scan> location is available on the machine you are running ToolsUI
    Now start up ToolsUI, and in the viewer tab, navigate to test.ncml and try to open it.
  2. If the dataset is dynamic ( files can be added or deleted), add the recheckEvery attribute on the scan element and open the dataset, then reopen after a new file has arrived (and recheckEvery time has passed). Generally you make recheckEvery very short while testing.
  3. Now add the NcML dataset back to the TDS, without a recheckEvery attribute on the scan element. See if OPeNDAP access works,
  4. Add the recheckEvery attribute (if needed) and test again.

Cant use HTTPServer

Remember that you can't use HTTPServer for NcML datasets. Use only the subsetting services OpenDAP, WCS, WMS, and NetcdfSubset.

 

Related Documents:


This document is maintained by John Caron and was last updated July 2009