[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thredds - Grib2 Collection Indexing as Independent Task



Hi Tim:

1. The CFSR dataset has some know encoding defects, not sure if your files have 
those problems.

2. change

 <collection 
spec="/thredds02/cf_reanalysis/**/ocnh[0-9]{2}\.gdas\.[0-9]{10}\.grb2" 
recheckAfter="5 min" olderThan="5 min"/>

to

 <collection 
spec="/thredds02/cf_reanalysis/**/ocnh[0-9]{2}\.gdas\.[0-9]{10}\.grb2" />

because you done want to rescan this collection every 5 minutes!

3. stop the tds, delete or archive off content/thredds/logs, restart the tds, 
run for an hour, then zip up the log files and send them to me. optionally stop 
the tds until we can check if there is a problem you have to redo anyway.


4.  we do have a background indexer, but its still beta. you can look at the 
docs we have so far, but i wouldnt try to run yet:

http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.3/reference/collections/TDM.html



John


On 8/12/2013 3:07 PM, Timothy Lewis - NOAA Affiliate wrote:
> John,
> 
> I'm sorry for the blank message.  I accidentally discovered a new 
> keyboard shortcut for sending a draft email.
> 
> The dataset in question has about 300,000 files(28 files per day for 30 
> years).  I've attached the catalog file for the aggregation, as well as 
> the threddsConfig.xml.  These are the only relevant configuration files 
> I know of.  If you would like others, please let me know.  Thank you for 
> your help, and sorry again for the mistakenly sent email.
> 
> Thanks,
> 
> Tim
> 
> On Mon, Aug 12, 2013 at 3:45 PM, John Caron <address@hidden 
> <mailto:address@hidden>> wrote:
> 
>     Hi Tim:
> 
>     1) can you send me your configuration files so i can be sure what
>     you are doing.
> 
>     2) how many files are there in the aggregation?
> 
>     John
> 
> 
>     On 8/12/2013 1:49 PM, Timothy Lewis - NOAA Affiliate wrote:
> 
>         John,
> 
>         My name is Tim Lewis, and I manage the OceanNOMADS Thredds server at
>         NCDDC.  We are attempting to aggregate 30 years worth of Climate
>         Forecast System Reanalysis.  We've added the aggregation to our
>         Thredds
>         server, but indexing the grib2 files seems to slow the server
>         down by
>         hogging all resources.  Performance gets progressively worse
>         until the
>         server becomes unusable and must be restarted.
> 
>         Our current aggregation has been indexing for about 7 days,
>         interruped
>         twice for restarts due to performance.  We have tested an
>         aggregation of
>         10% of this dataset before, and it took about 3 days to build the
>         aggregation.  Assuming a linear scaling, we're looking at a month of
>         indexing and therefore a month of poor performance.  The
>         aggregation can
>         be reached at the following URL:
> 
>         
> http://ecowatch.ncddc.noaa.__gov/thredds/oceanNomads/aggs/__catalog_cfsr_aggs.html
>         
> <http://ecowatch.ncddc.noaa.gov/thredds/oceanNomads/aggs/catalog_cfsr_aggs.html>
> 
>         Is there any way to separate the indexing the feature collection
>         from
>         the serving of data requests?  Ideally, we would be able to
>         background
>         an interruptable indexing task and continue to serve data
>         through the
>         web interface.  This morning, we attempted pointing a separate
>         Thredds
>         installation at a pre-indexed aggregation, thinking that we
>         could index
>         on one machine and then serve from another.  This was unsuccessful,
>         though I'm not sure why, being as the ncx files were already
>         present.
> 
>         Do you have any suggestions on how we might have this aggregation
>         indexed while still serving regular requests without the performance
>         hit?  We appreciate any advice you can give.  Thank you for your
>         help.
> 
>         Sincerely,
> 
>         Tim Lewis
> 
> 
> 
>         --
>         Tim Lewis, Associate Software Engineer
>         General Dynamics Information Technology
>         NOAA Coastal Data Development Center
>         1021 Balch Boulevard, Suite 1003
>         Stennis Space Center, Mississippi 39529 USA
> 
>         _228.688.2126 <tel:228.688.2126> <tel:228.688.2126
>         <tel:228.688.2126>>_
>         address@hidden <mailto:address@hidden>
>         <mailto:address@hidden <mailto:address@hidden>__>_
>         address@hidden <mailto:address@hidden>
>         <mailto:address@hidden <mailto:address@hidden>__>_
> 
> 
> 
> 
> -- 
> Tim Lewis, Associate Software Engineer
> General Dynamics Information Technology
> NOAA Coastal Data Development Center
> 1021 Balch Boulevard, Suite 1003
> Stennis Space Center, Mississippi 39529 USA
> 
> _228.688.2126_
> address@hidden <mailto:address@hidden>_
> address@hidden <mailto:address@hidden>_