[thredds] NcML joinExisting aggregation with multiple folder scans

Dear All

I have a gridded dataset composed of annual netCDF files containing daily data 
for the time period 1890-2012.  These data have been exposed as a THREDDS NcML 
aggregation which has been given a DOI (digital object identifier); therefore, 
as the aggregation has been DOI'd it cannot be modified in anyway.  The data 
provider has now provided additional data for the years 2013 and 2014 and I am 
trying to create a second NcML aggregation for 1890-2014 (which will also be 
DOI'd).  The data for 1890-2012 are in one folder whilst the data for 2013 and 
2014 are in a second folder.  I need to keep the files in their respective 
folders as the folders have been check-summed for data auditing purposes.

Before I create the NcML aggregation on one of our THREDDS servers I have 
created an NcML file which I am testing using the Java ToolsUI-4.6 application. 
 The NcML file consists of:

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
       <aggregation type="joinExisting" dimName="time">
              <scan 
location="Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/" 
regExp="CEH_GEAR_daily_GB_20[0-9]{2}.nc" />
              <scan location="Z:/eidchub/f2856ee8-da6e-4b67-bedb-590520c77b3c/" 
regExp="CEH_GEAR_daily_GB_201[0-9].nc" />
       </aggregation>
</netcdf>

where I'm trying to use two folder scans, both with regular expressions, with 
the first folder scan selecting netCDF files from the first folder for 
2000-2012 and the second selecting netCDF files from the second folder for 2013 
and 2014.  Note that I'm only selecting data for 2000-2014 for 
development/speed but ultimately need to create the aggregation for the whole 
time period 1890-2014 hence the reason why I'm keen to use folder scans to save 
having to define the individual files.  When I check the aggregation using the 
NcML | Aggregation tab in ToolsUI I get the following summary:

  Type=joinExisting
  dimName=time
  Datasets (15)
   Z:/eidchub/f2856ee8-da6e-4b67-bedb-590520c77b3c/CEH_GEAR_daily_GB_2013.nc 
range=[0:365) (365)
   Z:/eidchub/f2856ee8-da6e-4b67-bedb-590520c77b3c/CEH_GEAR_daily_GB_2014.nc 
range=[365:730) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2000.nc
 range=[730:1096) (366)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2001.nc
 range=[1096:1461) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2002.nc
 range=[1461:1826) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2003.nc
 range=[1826:2191) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2004.nc
 range=[2191:2557) (366)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2005.nc
 range=[2557:2922) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2006.nc
 range=[2922:3287) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2007.nc
 range=[3287:3652) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2008.nc
 range=[3652:4018) (366)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2009.nc
 range=[4018:4383) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2010.nc
 range=[4383:4748) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2011.nc
 range=[4748:5113) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2012.nc
 range=[5113:5479) (366)
  timeUnitsChange=true
  totalCoords=5479

Aggregation Variables
   time(time=5479)
   rainfall_amount(time=5479, y=1251, x=701)
   min_dist(time=5479, y=1251, x=701)

Cache Variables
   time (ucar.nc2.ncml.AggregationOuterDimension$CoordValueVar)

Variable Proxies
                    lat proxy ucar.nc2.ncml.Aggregation$DatasetProxyReader
                    lon proxy ucar.nc2.ncml.Aggregation$DatasetProxyReader
                    crs cached
        rainfall_amount proxy ucar.nc2.ncml.AggregationExisting
               min_dist proxy ucar.nc2.ncml.AggregationExisting
                     x proxy ucar.nc2.dataset.CoordinateAxis1D
                      y proxy ucar.nc2.dataset.CoordinateAxis1D
                   time proxy ucar.nc2.dataset.CoordinateAxis1D

Hence the files for the years 2013-2014 are given the time steps 0 to 730 
whilst the files for 2000-2012 are given the time steps 730-5479, which is 
incorrect; the time steps for the files for 2000-2012 should be 0 to 4749 and 
for 2013-2014 4749 to 5479.  I can only suggest that this is occurring because 
the aggregation is combining the files alphabetically?

I have checked the netCDF files and the time coordinates in all files are 
defined relative to 1800-01-01, for example:
    double time(time=366);
      :units = "days since 1800-1-1";
      :calendar = "gregorian";
      :long_name = "Time in days since 1800-1-1 (on 2012-1-1: day 77431)";
and have the expected values:

2000-01-01 to 2012-12-31: 73048 to 77796 (days since 1800-01-01)

and

2013-01-01 to 2014-12-31: 77797 to 78526 (days since 1800-01-01).

I also attempted to use the timeUnitsChange="true" flag when defining the 
aggregation but this didn't appear to have any effect.

I then created an NcML file creating a joinExisting aggregation specifying the 
individual netCDF files for the years 2000-2014:

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
       <aggregation type="joinExisting" dimName="time">
              <netcdf 
location="Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2000.nc"/>
              <netcdf 
location="Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2001.nc"/>
              <netcdf 
location="Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2002.nc"/>
              <netcdf 
location="Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2003.nc"/>
              <netcdf 
location="Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2004.nc"/>
              <netcdf 
location="Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2005.nc"/>
              <netcdf 
location="Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2006.nc"/>
              <netcdf 
location="Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2007.nc"/>
              <netcdf 
location="Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2008.nc"/>
              <netcdf 
location="Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2009.nc"/>
              <netcdf 
location="Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2010.nc"/>
              <netcdf 
location="Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2011.nc"/>
              <netcdf 
location="Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2012.nc"/>
              <netcdf 
location="Z:/eidchub/f2856ee8-da6e-4b67-bedb-590520c77b3c/CEH_GEAR_daily_GB_2013.nc"/>
              <netcdf 
location="Z:/eidchub/f2856ee8-da6e-4b67-bedb-590520c77b3c/CEH_GEAR_daily_GB_2014.nc"/>
       </aggregation>
</netcdf>

and I get the following summary when I check the NcML file in ToolsUI:

  Type=joinExisting
  dimName=time
  Datasets (15)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2000.nc
 range=[0:366) (366)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2001.nc
 range=[366:731) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2002.nc
 range=[731:1096) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2003.nc
 range=[1096:1461) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2004.nc
 range=[1461:1827) (366)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2005.nc
 range=[1827:2192) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2006.nc
 range=[2192:2557) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2007.nc
 range=[2557:2922) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2008.nc
 range=[2922:3288) (366)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2009.nc
 range=[3288:3653) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2010.nc
 range=[3653:4018) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2011.nc
 range=[4018:4383) (365)
   
Z:/thredds/uk_rainfall/GB/daily/netCDF/uncompressed/detail/CEH_GEAR_daily_GB_2012.nc
 range=[4383:4749) (366)
   Z:/eidchub/f2856ee8-da6e-4b67-bedb-590520c77b3c/CEH_GEAR_daily_GB_2013.nc 
range=[4749:5114) (365)
   Z:/eidchub/f2856ee8-da6e-4b67-bedb-590520c77b3c/CEH_GEAR_daily_GB_2014.nc 
range=[5114:5479) (365)
  timeUnitsChange=false
  totalCoords=5479

Aggregation Variables
   time(time=5479)
   rainfall_amount(time=5479, y=1251, x=701)
   min_dist(time=5479, y=1251, x=701)

Cache Variables
   time (ucar.nc2.ncml.AggregationOuterDimension$CoordValueVar)

Variable Proxies
                    lat proxy ucar.nc2.ncml.Aggregation$DatasetProxyReader
                    lon proxy ucar.nc2.ncml.Aggregation$DatasetProxyReader
                    crs cached
        rainfall_amount proxy ucar.nc2.ncml.AggregationExisting
               min_dist proxy ucar.nc2.ncml.AggregationExisting
                      x proxy ucar.nc2.dataset.CoordinateAxis1D
                      y proxy ucar.nc2.dataset.CoordinateAxis1D
                   time proxy ucar.nc2.dataset.CoordinateAxis1D

which correctly aggregates the time dimension across the files.

Therefore, is it possible to use multiple folder scans when creating an NcML 
joinExisting aggregation?  And if it is, can anybody see what I'm doing wrong 
in my NcML file?



Many thanks for any help that anyone can provide.  Best wishes, Simon.


________________________________
This message (and any attachments) is for the recipient only. NERC is subject 
to the Freedom of Information Act 2000 and the contents of this email and any 
reply you make may be disclosed by NERC unless it is exempt from release under 
the Act. Any material supplied to NERC may be stored in an electronic records 
management system.
________________________________
  • 2015 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: