Re: [thredds] Static aggregations

Hi John,

Im guessing that you have caching disabled, or is ineffective for some reason. Can you send me your threddsConfig.xml file to verify that ? If thats true, is that deliberate?

NetcdfFileCache is deliberately disabled. We have a several datasets that get replaced with identical file names on a regular basis (hourly, daily, etc.). This setup dates back to TDS 3.16 so perhaps this is better handled in 4.1 and I can turn the cache back on.

I have a couple more questions about NetcdfFileCache. (1)What format will the NetCDF cache files have. The dataset in question is a collection of NetCDF-4 files because we needed to compress them to save space. (2)Will TDS 4.1 cache the entire dataset in question? Each yearly aggregation is ~3000 files and if the files are cached in NetCDF-3 format our scratch space will fill up quite quick.


Under those circumstances, any access to an aggregation has to rebuild the aggregation, no matter what the recheckEvery setting is. TDS 4.1 now has a file system cache using ehcache, which will only do an OS file scan when the directory changes.

This could be what's causing the shorter delays (10 seconds) as that is about how long it takes to list the directory in question (~29,000 files) when it hasn't been accessed in a while.

So bottom line is, re-enable the NetcdfFile object cache and things should work as expected. If thats not the case then we have more investigating to do.

Dave

David Robertson wrote:
Hi,

Richard Signell wrote:
I'm pretty sure the caching behavior has changed a lot with different
versions of the THREDDS Data Server -- and I'm pretty sure the latest
4.1 server does not rescan the entire aggregation.
What version are you using?

TDS 4.1 built on October 10th. I'm not positive that it's rescanning the entire directory but it definitely takes longer (10-60 seconds versus ~1 second) just after I touch a file in the directory. This current test was using recheckEvery="-1" but the results are the same without recheckEvery and with it set to a normal value like "15 min".

As long as nothing in the directory has a new date, I can access the dataset in ~1 second even hours later.

Dave



-Rich

On Fri, Oct 23, 2009 at 12:27 PM, David Robertson
<robertson@xxxxxxxxxxxxxxxxxx> wrote:
Hi all,

It seems that THREDDS forces a rescan if the time stamp on the directory has changed. Even if I set recheckEvery to -1 or 90 days it still appears to rescan when the modified date on the folder changes. I tested this using a
simple "touch junk" command in the directory I'm aggregating.

This makes sense so that files added to the directory can be added to the aggregation. However, is there a way to tell TDS to skip this step for a given dataset or will I need to put my non-changing datasets in subfolders? If I put them in subfolders will I be able to aggregate the entire dataset
together anymore? Perhaps something like:

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
  <aggregation dimName="time" type="joinExisting">
     <scan location="/home/om/dods-data/thredds/cool/avhrr/nc4/2006/"
           regExp="^2006.*\.nc" />
  </aggregation>
</netcdf>

for each year and

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
  <aggregation dimName="time" type="joinExisting">
     <scan location="/home/om/dods-data/thredds/cool/avhrr/nc4/"
           regExp=".*/*\.nc" />
  </aggregation>
</netcdf>

to get the aggregation to go through the subfolders and put the years
together?

The solution of getting TDS to skip rescan on specific datasets would be
preferable to simplify scripts and avoid having to change them each new
year.

Thanks,
Dave

Roy Mendelssohn wrote:
I believe if you don't set rescan it uses the default value - but would have to check on that. I know there is a way to tell it to not rescan.

-Roy

On Oct 22, 2009, at 9:42 AM, David Robertson wrote:

Hi,

Roy Mendelssohn wrote:
What is your rescan set to for that dataset? That is probably what is
causing it.
I am not using rescan or recheckEvery so that's probably the problem the
dataset element I'm using is pasted below:

<dataset name="2006"
       ID="cool-avhrr-bigbight-2006"
       urlPath="cool/avhrr/bigbight/2006" >

 <metadata inherited="true">
    <timeCoverage>
       <start>2006-01-01 03:10:00 UTC</start>
       <end>2006-12-31 22:53:00 UTC</end>
    </timeCoverage>
    <geospatialCoverage>
       <northsouth>
          <start>34.9950981140137</start>
          <size> 11.0090637207031</size>
          <units>degrees_north</units>
       </northsouth>
       <eastwest>
          <start>-77.0059967041016</start>
          <size>  14.0119972229004</size>
          <units>degrees_east</units>
       </eastwest>
    </geospatialCoverage>
 </metadata>

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
    <aggregation dimName="time" type="joinExisting">
       <scan location="/home/om/dods-data/thredds/cool/avhrr/nc4/"
             regExp="^2006.*\.nc" />
    </aggregation>
 </netcdf>
</dataset><!--2006-->


On Oct 22, 2009, at 8:47 AM, David Robertson wrote:
Hi,

Is there a way to tell the TDS NOT to look for new files to add to an aggregated dataset? I have several aggregations set up that do not change (no added or removed or modified files). Yesterday after generating the aggregation cache, access the dataset was quite quick; ~1 second to load the Data Access Form. However, when I try to access those same datasets today it takes just as long as it did to generate the aggregation cache in the first
place (5 minutes).

It should be noted that these aggregations are subsets of files in a directory that IS being updated. What I have done used a regExp to separate a very large dataset into years. The 2008 and prior aggregations will not have files added so I'm looking for a way to stop the TDS from searching for
new files to add to the aggregation cache.

Thanks,
Dave

_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit:
http://www.unidata.ucar.edu/mailing_lists/
**********************
"The contents of this message do not reflect any position of the U.S.
Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097
e-mail: Roy.Mendelssohn@xxxxxxxx (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/
"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"
**********************
"The contents of this message do not reflect any position of the U.S.
Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: Roy.Mendelssohn@xxxxxxxx (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"

_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit:
http://www.unidata.ucar.edu/mailing_lists/





_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/




  • 2009 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: