[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[THREDDS #ARH-205811]: Aggregations, dateFormatMark, and filtering/grouping



Hi Greg,

Glad you found a solution. Sorry it wasn't clear in the documentation. There 
are a number of methods for constructing aggregations and a few different 
documents describing those methods. The best one for getting all the detail is 
the "NcML Annotated Schema" document and especially the "aggregation Element" 
section.

http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/AnnotatedSchema4.html#aggregation

We'll take a look at the FMRC and general aggregation documents and see if we 
can clear things up.

Thanks,

Ethan

On 8/19/2010 8:02 AM, Williams, Greg wrote:
> New Client Reply: Aggregations, dateFormatMark, and filtering/grouping
> 
>  
> Hmm... I seem to have found a solution in the form of the 'regExp'
> element.
> (described in the DTD, but not in clear sight on the aggregations page).
> Thanks.
> 
> ________________________________
> 
> From: Williams, Greg 
> Sent: 19 August 2010 12:22
> To: 'address@hidden'
> Subject: Aggregations, dateFormatMark, and filtering/grouping
> 
> 
>  
> Hi,
>  
> Running the latest stable TDS 4.1.7, I have a problem correctly
> aggregating model runs based on dateFormatMark.
> I've searched the online docs/lists and can't see an answer to this, so
> I'm hoping you can help...
>  
> My setup is as follows:
> 1.  An ftp site exists, where model runs are uploaded to 'dated'
> directories under a top-level 'grib1' directory.
>     (eg. ./grib1/upd20100817, ./grib1/upd20100818, ./grib1/upd20100818,
> etc)
>  
> 2.  Each dated directory contains several model datasets, each with a
> prefix (per model area) and a reference time.
>     (eg. sca.2010081800.000.grb, as yyyyMMddHH as the model
> run/reference time and multiple timesteps per file)
>  
> 3.  Examples of prefixes for model areas are 'sca', 'gof', 'grand',
> 'global', 'fint' (there are 20 or so at the moment).
>  
>  
> What I want is to aggregate all the 'sca' runs into a FMRC, all the
> 'gof' runs into a separate FMRC, etc, etc.  Those sections of my catalog
> are included below for sca, gof, and grand data.
>  
> The problem seems to be that the 'dateFormatMark' option just counts
> characters before the # mark and does not perform a character match.
> Could that be true?
>  
> One effect (in this example) is that I think it's trying to aggregate
> all the 3-character-prefixed sets together (ie. sca and gof') and that
> doesn't work due to a clash of reference-times.
>  
> Another effect is that the TDS logs show errors from date-matching
> against other prefixes.
> Eg. Attempts to aggregate the 'sca' set encounter the 'grand.*' files
> and cause:
>  
>     java.lang.RuntimeException: SimpleDateFormat bad = yyyyMMddHH
> Unparseable date: "nd.201008"
>  
>  
> I have no control over the directory structures or model prefixes, and
> cannot partition the files into separate directories by prefix (for
> example).
> I've tried using a 'filter' section in the catalog (straight after the
> end of the 'metadata' section), but the aggregation 'scan' seems
> unaffected and still encounters/includes files with the same
> prefix-length or different prefix lengths (and fails).  
>  
> Is there a way to make dateFormatMark do proper matching, or another
> solution to this?
>  
> Thanks.
> Greg.
>  
>  
>  
> ---
>  
>   <datasetFmrc name="sca" collectionType="ForecastModelRuns"
> harvest="true" path="fmrc/sca">
>     <metadata inherited="true">
>       <serviceName>all</serviceName>
>       <dataType>Grid</dataType>
>       <dataFormat>GRIB-1</dataFormat>
>     </metadata>
>     <netcdf
> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2
> <http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2> "
> enhance="true">
>       <aggregation dimName="run" type="forecastModelRunCollection"
> timeUnitsChange="true" recheckEvery="15 min">
>         <scan location="/export/ftp/pub/model/grib1/" suffix=".grb"
> dateFormatMark="sca.#yyyyMMddHH" olderThan="1 min" />
>       </aggregation>
>     </netcdf>
>   </datasetFmrc>
> 
>   <datasetFmrc name="gof" collectionType="ForecastModelRuns"
> harvest="true" path="fmrc/gof">
>     <metadata inherited="true">
>       <serviceName>all</serviceName>
>       <dataType>Grid</dataType>
>       <dataFormat>GRIB-1</dataFormat>
>     </metadata>
>     <netcdf
> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2
> <http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2> "
> enhance="true">
>       <aggregation dimName="run" type="forecastModelRunCollection"
> timeUnitsChange="true" recheckEvery="15 min">
>         <scan location="/export/ftp/pub/model/grib1/" suffix=".grb"
> dateFormatMark="gof.#yyyyMMddHH" olderThan="1 min" />
>       </aggregation>
>     </netcdf>
>   </datasetFmrc>
>  
>   <datasetFmrc name="grand" collectionType="ForecastModelRuns"
> harvest="true" path="fmrc/grand">
>     <metadata inherited="true">
>       <serviceName>all</serviceName>
>       <dataType>Grid</dataType>
>       <dataFormat>GRIB-1</dataFormat>
>     </metadata>
>     <netcdf
> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2
> <http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2> "
> enhance="true">
>       <aggregation dimName="run" type="forecastModelRunCollection"
> timeUnitsChange="true" recheckEvery="15 min">
>         <scan location="/export/ftp/pub/model/grib1/" suffix=".grb"
> dateFormatMark="grand.#yyyyMMddHH" olderThan="1 min" />
>       </aggregation>
>     </netcdf>
>   </datasetFmrc>
>  
> ---

Ticket Details
===================
Ticket ID: ARH-205811
Department: Support THREDDS
Priority: Normal
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.