[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[THREDDS #LKY-813484]: Missing variable in aggregation



Hi Jonathan,

Nested aggregations should be fairly robust and reliable though there may be 
some edge cases we've missed. So, the work around you are using should be OK 
for small collections of datasets but it may not scale well to larger 
collections.

In TDS 4.2, we have reworked much of the code behind the FMRC aggregation. It 
will now (when given a complete "prototype" dataset) recognize that data is 
missing and automatically return missing values as appropriate. This is not 
currently implemented in the other aggregation types but we hope to add this 
feature for other aggregations in TDS 4.3 or 4.4.

Here's a link to the "prototype" dataset section in the FeatureCollection 
(FMRC) documentation:

http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/collections/FeatureCollections.html#elements

Ethan

Jonathan Wilkins wrote:
> Hi all,
> 
> Is there any way to make aggregations work when a variable is missing in
> one or more files ?
> 
> I mean, for example in a joinExisting aggregation along the time
> dimension, assuming I have netcdf files which have the variables
> named var1 and var2 but for a part of these files, var2 is missing.
> Is it then possible to make Thredds detect this and return missing/fill
> values when requesting data for var2 for a selection where var2 is not
> in corresponding files ?
> 
> The goal of this is to transparently support files that contain new
> variables without having to re-process the old files by adding the new
> variable with filled values.  This would also result in a disk space save.
> 
> The only way I found is to insert a "patch" file in an union aggregation
> (see below).
> 
> Using this method, the patch file must not be declared as the first netcdf
> element of the aggregation and then can have any value in the time,
> latitude, longitude dimensions as, in fact, values are taken from the
> first netcdf element file.  The var2 variable of this patch file is filled
> with a "missing value" (the same as in the new files).  So there is only
> one patch file to produce and it is applied to all files missing var2.
> 
> This seems to work as I expect but is this correct and reliable ?
> Is there a better way to achieve this and if not, could this feature be
> considered for a later version ?
> 
> Regards
> 
> Jonathan Wilkins
> Actimar, France
> 
> 
> #####
> 
> The dataset:
> 
> <dataset name="test_missing_variable_agg"
>          ID="test_missing_variable_agg"
>          urlPath="test_missing_variable_agg">
>   <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
>     <aggregation dimName="time" type="joinExisting">
>       <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
>         <aggregation dimName="time" type="union">
>           <netcdf location="file:///data/test_missing_variable_1.nc"/>
>           <netcdf location="file:///data/test_missing_variable_patch.nc"/>
>         </aggregation>
>       </netcdf>
>       <netcdf location="file:///data/test_missing_variable_2.nc"/>
>       <netcdf location="file:///data/test_missing_variable_3.nc"/>
>     </aggregation>
>   </netcdf>
> </dataset>
> 
> #####
> 
> The old file without var2:
> 
> netcdf test_missing_variable_1.nc {
> dimensions:
> time = 24 ;
> latitude = 48 ;
> longitude = 44 ;
> variables:
> double time(time) ;
> ...
> float latitude(latitude) ;
> ...
> float longitude(longitude) ;
> ...
> float var1(time, latitude, longitude) ;
> ...
> 
> #####
> 
> The new files with var1 and var2:
> 
> netcdf test_missing_variable_2.nc {
> dimensions:
> time = 24 ;
> latitude = 48 ;
> longitude = 44 ;
> variables:
> double time(time) ;
> ...
> float latitude(latitude) ;
> ...
> float longitude(longitude) ;
> ...
> float var1(time, latitude, longitude) ;
> ...
> float var2(time, latitude, longitude) ;
> ...
> 
> test_missing_variable_2.nc contains data for the day after
> test_missing_variable_1.nc, plus var2
> test_missing_variable_3.nc looks like test_missing_variable_2.nc and
> contains data for the day after
> 
> #####
> 
> The patch file with var2 (set to missing_value):
> 
> netcdf test_missing_variable_patch.nc {
> dimensions:
> time = 24 ;
> latitude = 48 ;
> longitude = 44 ;
> variables:
> double time(time) ;
> ...
> float latitude(latitude) ;
> ...
> float longitude(longitude) ;
> ...
> float var2(time, latitude, longitude) ;
> 
> #####


Ticket Details
===================
Ticket ID: LKY-813484
Department: Support THREDDS
Priority: High
Status: Closed