[thredds] GRIB collection keeps re-scanning, data extraction extremely slow

Dear all,

coming back to my question on NCEP grib aggregation a while back:
John Caron wrote: 

> What you are seeing is the limitations of aggregations. In this
> case, there are 3 different time coordinates in the collection, but
> NcML aggregation can only aggregate on one of them.  You want to use
> feature collections instead. Replace your entire <dataset> element
> with something like: 
>
> <featureCollection name="myCollectionName" featureType="GRIB" 
> path="grib/NCEP/GFS/etc">
>         <collection 
> spec="/pub/data/nccf/com/gfs/prod/gfs.2013060600/gfs\.t00z\.pgrb2f..$"
>                     dateFormatMark="#prod/gfs.#yyyyMMddHH"  />
>       </featureCollection>

I've finally implemented the GRIB featurecollection, and it seems to
be working.  I can access my grib files (although indexing of the full
ncep ensemble takes a while!) and the data comes out OK.  (timeseries
for a single location, for each ensemble member).

I'm experiencing a problem though: data extraction is extremely slow.
I'm comparing to my old situation where I listed all ensemble member
grib files in a single ncml file.  A data extraction for one location
(all members, for a single 15 day forecast) took 5 minutes in thredds
4.1 with this system.

In the new situation (thredds 4.3), I use a grib feature collection
sorted by directory (forecast cycle).  After half an hour, the
extraction is still not done.  In the featureCollectionScan.log I can
see that thredds keeps scanning all folders: 

> [2013-08-29T12:40:28.786+0000] INFO  
> thredds.inventory.MFileCollectionManager: 2013082618 : was scanned 
> MCollection{name='2013082618', 
> dirName='/output/operational/atmosphere/ncep/gefs/1.0deg/2013082618', 
> wantSubdirs=true, ff=WildcardMatchOnPath{wildcard=null 
> regexp=gefs\..*\.f.*\.grib2$}}

It does this for ALL forecast cycle folders (I have about 20), even
though I am accessing only the 2013082900 directory.  Could anyone
give me tips on how to prevent thredds from continuously re-scanning
the whole directory structure with grib files?

Current setup:

    <featureCollection name="gefs_col" featureType="GRIB" 
path="ncep/gefs/1.0deg">
      <!-- be specific here with the file selector, other grib2 files may be 
hanging around in the tree -->
      <collection 
spec="/output/operational/atmosphere/ncep/gefs/1.0deg/**/gefs\..*\.f.*\.grib2$"
                  dateFormatMark="#0deg/#yyyyMMddHH"  
                  timePartition="directory"
                  name="gefs_col_unique" />
      <update startup="true" trigger="allow"/>
    </featureCollection>


This organizes the data the way I want: I get a single url per cycle:
     .../thredds/dodsC/ncep/gefs/1.0deg/2013082900/best
                                    .../2013082800/best
                                    .../2013082700/best

The data comes out the way I want, but as mentioned above it's
_extremely_ slow, likely due to re-scanning of the disk structure.

I don't really need automatic updating, a manual trigger when a new
forecast is downloaded would be ok too.  I would prefer thredds to
scan and index the grib files only once upon a manual trigger.

Any hints on how to improve this?


Kind regards,
     Hein Zelle


> Send thredds mailing list submissions to
>       thredds@xxxxxxxxxxxxxxxx
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>       http://mailman.unidata.ucar.edu/mailman/listinfo/thredds
> or, via email, send a message with subject or body 'help' to
>       thredds-request@xxxxxxxxxxxxxxxx
> 
> You can reach the person managing the list at
>       thredds-owner@xxxxxxxxxxxxxxxx
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of thredds digest..."
> 
> 
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/
> 
> Today's Topics:
> 
>    1. Re: aggregating GFS data, problem with accumulated (Hein Zelle)
>    2. Re: aggregating GFS data, problem with accumulated (John Caron)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Thu, 6 Jun 2013 11:30:36 +0200
> From: Hein Zelle <hein.zelle@xxxxxxxxxxxxx>
> To: thredds@xxxxxxxxxxxxxxxx
> Subject: Re: [thredds] aggregating GFS data, problem with accumulated
> Message-ID: <20130606093035.GA17727@xxxxxxxxxxxxxxxxxxxx>
> Content-Type: text/plain; charset="us-ascii"
> 
> Dear John,
> 
> attached to this email is a complete ncml file that we place next to
> the data files.  The data files themselves are too big to upload, but
> they are standard gfs grib2 files, you can find them at
> 
> ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.2013060600
> 
> (that's for this morning, modify the date as needed)
> The files are the grib2 files at 0.5 degree, e.g.  gfs.t00z.pgrb2bf30
> (50 mb each).
> 
> The previous snippet of ncml I sent should also work, you'll have to modify
> the paths to the correct folder of course.  A variable to check is for
> example 
> 
> Total_precipitation_surface_3_Hour_Accumulation
> 
> These should have multiple time steps, but I get only 1 time step (the
> first, for the +03 forecast). The +00 analysis doesn't contain the
> precipitation fields.  Any variable with an accumulation or averaging
> interval exhibits the problem.
> 
> 
> Kind regards,
>      Hein Zelle
> 
> 


-- 

Dr. Hein Zelle
Senior consultant meteorology & oceanography
BMT ARGOSS

Tel:        +31 (0)527-242299
Fax:        +31 (0)527-242016
E-mail:     hein.zelle@xxxxxxxxxxxxx
Website:    www.bmtargoss.com
            
BMT ARGOSS b.v.
Voorsterweg 28, 8316 PT Marknesse, the Netherlands
Postal address: P.O. Box 61, 8325 ZH Vollenhove, the Netherlands

Registered in The Netherlands, Registered no. 39060160.

Unless otherwise agreed by BMT ARGOSS in writing, all work,
services, goods or products supplied by BMT ARGOSS shall be subject
to and governed by BMT ARGOSS' own terms and conditions which are
available for inspection from BMT ARGOSS on request.

E-mail confidentiality notice and disclaimer:
The contents of this e-mail and any attachments are intended for the
use of the mail addressee(s) shown. If you are not that person, you
are not allowed to read it, to take any action based upon it or to
copy it, forward, distribute or disclose the contents of it and you
should please delete it from your system. BMT ARGOSS does not accept
liability for any errors or omissions in the context of this e-mail or
its attachments which arise as a result of internet transmission, nor
accept liability for statements which are those of the author and
clearly not made on behalf of BMT ARGOSS.



  • 2013 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: