[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thredds - Grib2 Collection Indexing as Independent Task



Hi Tim:

1. The CFSR dataset has some know encoding defects, not sure if your files have 
those problems.

2. change

 <collection 
spec="/thredds02/cf_reanalysis/**/ocnh[0-9]{2}\.gdas\.[0-9]{10}\.grb2" 
recheckAfter="5 min" olderThan="5 min"/>

to

 <collection 
spec="/thredds02/cf_reanalysis/**/ocnh[0-9]{2}\.gdas\.[0-9]{10}\.grb2" />

because you done want to rescan this collection every 5 minutes!

3. stop the tds, delete or archive off content/thredds/logs, restart the tds, 
run for an hour, then zip up the log files and send them to me. optionally stop 
the tds until we can check if there is a problem you have to redo anyway.


4.  we do have a background indexer, but its still beta. you can look at the 
docs we have so far, but i wouldnt try to run yet:

http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.3/reference/collections/TDM.html



John


On 8/12/2013 3:07 PM, Timothy Lewis - NOAA Affiliate wrote:
> John,
> 
> I'm sorry for the blank message.  I accidentally discovered a new 
> keyboard shortcut for sending a draft email.
> 
> The dataset in question has about 300,000 files(28 files per day for 30 
> years).  I've attached the catalog file for the aggregation, as well as 
> the threddsConfig.xml.  These are the only relevant configuration files 
> I know of.  If you would like others, please let me know.  Thank you for 
> your help, and sorry again for the mistakenly sent email.
> 
> Thanks,
> 
> Tim
> 
> On Mon, Aug 12, 2013 at 3:45 PM, John Caron <address@hidden 
> <mailto:address@hidden>> wrote:
> 
>     Hi Tim:
> 
>     1) can you send me your configuration files so i can be sure what
>     you are doing.
> 
>     2) how many files are there in the aggregation?
> 
>     John
> 
> 
>     On 8/12/2013 1:49 PM, Timothy Lewis - NOAA Affiliate wrote:
> 
>         John,
> 
>         My name is Tim Lewis, and I manage the OceanNOMADS Thredds server at
>         NCDDC.  We are attempting to aggregate 30 years worth of Climate
>         Forecast System Reanalysis.  We've added the aggregation to our
>         Thredds
>         server, but indexing the grib2 files seems to slow the server
>         down by
>         hogging all resources.  Performance gets progressively worse
>         until the
>         server becomes unusable and must be restarted.
> 
>         Our current aggregation has been indexing for about 7 days,
>         interruped
>         twice for restarts due to performance.  We have tested an
>         aggregation of
>         10% of this dataset before, and it took about 3 days to build the
>         aggregation.  Assuming a linear scaling, we're looking at a month of
>         indexing and therefore a month of poor performance.  The
>         aggregation can
>         be reached at the following URL:
> 
>         
> http://ecowatch.ncddc.noaa.__gov/thredds/oceanNomads/aggs/__catalog_cfsr_aggs.html
>         
> <http://ecowatch.ncddc.noaa.gov/thredds/oceanNomads/aggs/catalog_cfsr_aggs.html>
> 
>         Is there any way to separate the indexing the feature collection
>         from
>         the serving of data requests?  Ideally, we would be able to
>         background
>         an interruptable indexing task and continue to serve data
>         through the
>         web interface.  This morning, we attempted pointing a separate
>         Thredds
>         installation at a pre-indexed aggregation, thinking that we
>         could index
>         on one machine and then serve from another.  This was unsuccessful,
>         though I'm not sure why, being as the ncx files were already
>         present.
> 
>         Do you have any suggestions on how we might have this aggregation
>         indexed while still serving regular requests without the performance
>         hit?  We appreciate any advice you can give.  Thank you for your
>         help.
> 
>         Sincerely,
> 
>         Tim Lewis
> 
> 
> 
>         --
>         Tim Lewis, Associate Software Engineer
>         General Dynamics Information Technology
>         NOAA Coastal Data Development Center
>         1021 Balch Boulevard, Suite 1003
>         Stennis Space Center, Mississippi 39529 USA
> 
>         _228.688.2126 <tel:228.688.2126> <tel:228.688.2126
>         <tel:228.688.2126>>_
>         address@hidden <mailto:address@hidden>
>         <mailto:address@hidden <mailto:address@hidden>__>_
>         address@hidden <mailto:address@hidden>
>         <mailto:address@hidden <mailto:address@hidden>__>_
> 
> 
> 
> 
> -- 
> Tim Lewis, Associate Software Engineer
> General Dynamics Information Technology
> NOAA Coastal Data Development Center
> 1021 Balch Boulevard, Suite 1003
> Stennis Space Center, Mississippi 39529 USA
> 
> _228.688.2126_
> address@hidden <mailto:address@hidden>_
> address@hidden <mailto:address@hidden>_


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.