[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Fwd: Re: Simple demonstration]

Hi Jerome, et al:

I have discovered a number of problems with caching on joinExisting 
aggregations. I hope to have a fix by the end of the week.

Meanwhile, you will want to add a "recheckEvery" attribute to all your dynamic 
aggregations, eg:

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";> <aggregation dimName="time" type="joinExisting" recheckEvery="15 min" > <variableAgg name="CGusfc" /> <scan dateFormatMark="CG#yyyyDDD_HHmmss" location="/u00/satellite/CG/usfc/hday/" suffix=".nc" /> </aggregation> </netcdf>



This tells the system how often to rescan the directories. It will, as you 
suggest, only have to open new files that have appeared since the last time the 
dataset was opened.

Jerome King wrote:
Hi John,

Thanks for your response.
I think Bob can tell you more about the stability of the datasets. The
directories that have the most files are hourly datasets so files are
added every hour and that probably is not making the dataset very stable
i.e. the cache probably has to change often. But  I am not sure how and
when the cache file is built. It seems to me that the cached file is
built every time there is a new file in the dataset making the cached
file not very useful since it changes every hour.

I don't know if this could be possible, but what if a new cache file was
created from the old cache file + the info of the new file every time a
file is added?
Then at the end of the day or the week, a new cache file would be
created from all the files in the dataset.

Just throwing out ideas in the air...

On Mon, 2006-07-24 at 13:35, John Caron wrote:

Hi Jerome, Bob:

Im back from vacation, so im looking at this issue again.

What I think is happening is that the first time that a dataset is accessed, 
all the files have to be cracked and the coordinate value extracted. So that 
time will be proportional to the # of files. Subsequently, the info is read 
from the cached XML files (like the one you sent me), and it should be fast. 
Bob, is that what you are seeing? I dont think there is a differrence between 
using the opendap library or the nj22 library.

Im unclear what the systen is doing if/when files are added or deleted. Jerome, 
are these datasets stable, or do they change?

There are a few other details I need to investigate, so consider this a best 
guess for the moment, and let me know if you see any contrary evidence.

Thanks for checking it out...

Jerome King wrote:

Hi John,

I have attached the files requested.
Thanks for looking into this.

----- Original Message -----
From: John Caron <address@hidden>
Date: Friday, July 14, 2006 4:08 pm
Subject: Re: [Fwd: Re: Simple demonstration]

hi jerome:

can you send me 3 or file of the files in the /u00/satellite/CG/usfc/hday/ directory?

and also send me the satellite-CG-usfc-hday file in cacheAged?


Jerome King wrote:

hi John,

Bob asked me to respond to your questions:

I am guessing you're asking for the NcML for

"satellite/CM/usfc/hday"> dataset.

<netcdf xmlns=""


<aggregation dimName="time" type="joinExisting">
  <variableAgg name="CGusfc" />
  <scan dateFormatMark="CG#yyyyDDD_HHmmss"
location="/u00/satellite/CG/usfc/hday/" suffix=".nc" />


2) There are plenty of files in

$tomcat_home/content/thredds/cacheAged/> And there is a satellite- CG-usfc-hday file of 37K.

Let me know if I can check something else,

On Fri, 2006-07-14 at 15:43, Bob Simons wrote:

Can you please answer this and reply to John and me?

-------- Original Message --------
Subject: Re: Simple demonstration
Date: Fri, 14 Jul 2006 16:40:53 -0600
From: John Caron <address@hidden>
Organization: UCAR/Unidata
To: Bob Simons <address@hidden>
References: <address@hidden>

Im guessing that theres something wrong with the TDS caching, so

that it

has to recreate the dataset each time by reading all of the files.

What does the aggregation look like, for say the "satellite/CM/usfc/hday" dataset?

Can you look and see if there is a directory ${tomcat_home}/content/thredds/cacheAged, and if anything is in it?


Bob Simons wrote:

I have reduced the tests to their core:

* This connects to the opendapUrl and gets the dataDds from

the query.

* @param opendapUrl, e.g., "http://oceanwatch.pfeg.noaa.gov:8081/thredds/dodsC/satellit
* @param query e.g.,

"?CMusfc.CMusfc[0:1:0][0:1:0][0:1:20][0:1:20]">>> * @throws Exception if trouble

public static void simpleSpeedTest(String url, String query)


Exception {
boolean acceptDeflate = true;
dods.dap.DConnect dConnect = new dods.dap.DConnect(url, acceptDeflate);
long time = System.currentTimeMillis();
dods.dap.DataDDS dataDds = dConnect.getData(query, null);
System.out.println("Opendap.simpleSpeedTest(\n" +
"url=" + url + "\n" +
"query=" + query + "\n" +
"time=" + (System.currentTimeMillis() - time));

  * This performs a series of simple speed tests.
  * @throws Exception if trouble
 public static void doSimpleSpeedTests() throws Exception {











The results from running simpleSpeedTests is












These times are proportional to the times I mentioned earlier

and are

strongly correlated to the number of files in each directory.

On the other hand, I see that if I go to the CMusfc opendap web

page and

generate an ascii request for the same data


I get the response very quickly.

That seems to point to the problem being a quirk of

dConnect.getData. Or

am I misusing it?

Any suggestions?

Thank you.


Bob Simons
Satellite Data Product Manager
Environmental Research Division
NOAA Southwest Fisheries Science Center
1352 Lighthouse Ave
Pacific Grove, CA 93950-2079
<>< <>< <>< <>< <>< <>< <>< <>< <><

NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.