[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: THREDDS performance [was Re: THREDDS and grib]



- can you send me the cache file for the test dataset?

- can yuo give me approximate times of what you see vs what you would expect ?

Kevin O'Brien wrote:
Hi John -

I installed the version 3.16.37 of the server and unfortunately, it
doesn't seem to make the problem go away.  It did kind of seem like
things were a bit quicker at times, but it was hard to accurately
assess.  Of course, the initial access of these large aggregations is
still pretty slow, and then subsequent accesses are faster.  However, it
does seem like a restart of the tomcat server somehow erases the cache
information and so every initial access after a tomcat reboot is slow.
Is there anything else I can do to help further debug the problem?

thanks -
Kevin

John Caron wrote:
Hi Kevin: I made a small fix that looks like it would affect your
case, but im not convinced it really would cause a huge slowdown.
anyway, i wonder if you would give it a try and let me know?

Its TDS release 3.16.37.

thanks for your patience

Kevin O'Brien wrote:

Hi John -

Not to be a pest - but I was wondering if you'd had a chance to look
at these performance issues, or even been able to recreate them?

Thanks -
kevin

John Caron wrote:
these are all good questions - there have been similar reports of
the agg cache not working like it should. i will have to reproduce
to see whats happening.

Kevin O'Brien wrote:

Hi John -

I tried what you suggested and it didn't seem to have a significant
effect in making the initial access of the aggregated dataset
quicker.  It still took over a minute and a half to open the
dataset.  I've pasted the xml config that I used to define the new
aggregation below.  To be honest, I'm actually kind of glad because
I wasn't looking forward to modifying the guts of the application
which generates the xml config automatically.... :-)

I guess I can understand and probably even accept the fact that for
the first time the dataset is accessed, things will be a little
slow.  After that, I presume the dataset is available in the cache,
and of course subsequent accesses prove that it is because the
response is quite quick.    However, if the tomcat server is
restarted, it seems like whatever is in the cache is ignored and
the cache entries have to be rebuilt.   I have my aggregation cache
set like so:

 <AggregationCache>


<dir>/home/pmel/DataPortal/apache-tomcat-5.5.25/content/thredds/cacheAged/</dir>

   <scour>24 hours</scour>
   <maxAge>90 days</maxAge>
 </AggregationCache>  Does that seem correct? Also, as an aside,
you mention that you thought this would be quicker because it
avoids the OPeNDAP URL's....Shouldn't there be some client side
caching done w/ the OPeNDAP datasets?  For example, if I access a
remote dataset with ncdump (or Ferret), and my OPeNDAP caching is
turned on my ~/.dodsrc file, it will cache the response in the
~/.dods_cache directory.  Does any of that happen when OPeNDAP
URL's are accessed through TDS???

Anyway - here's the xml config I used as per your suggestion:

   <dataset ID="CM2.1U-D4_1PctTo2X_I1 atmos daily all vars
00010101-02201231_2" name="CM2.1U-D4_1PctTo2X_I1 atmos daily all
vars 00010101-02201231_2"
urlPath="ipcc_ar4_CM2.1_R1_1to2x-1_daily_atmos_00010101-02201231_2">
       <serviceName>thisDODS3</serviceName>
       <netcdf
xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
         <aggregation type="union">
            <netcdf
xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
              <aggregation dimName="time" type="joinExisting">
               <netcdf

location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/pr_A2.00010101-01001231.nc"
ncoords="36500" />
               <netcdf

location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/pr_A2.01010101-02001231.nc"
ncoords="36500" />
               <netcdf

location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/pr_A2.02010101-02201231.nc"
ncoords="7300" />
              </aggregation>
            </netcdf>
            <netcdf
xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
              <aggregation dimName="time" type="joinExisting">
               <netcdf

location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmax_A2.00010101-01001231.nc"
ncoords="36500" />
               <netcdf

location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmax_A2.01010101-02001231.nc"
ncoords="36500" />
               <netcdf

location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmax_A2.02010101-02201231.nc"
ncoords="7300" />
              </aggregation>
            </netcdf>
            <netcdf
xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
              <aggregation dimName="time" type="joinExisting">
               <netcdf

location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmin_A2.00010101-01001231.nc"
ncoords="36500" />
               <netcdf

location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmin_A2.01010101-02001231.nc"
ncoords="36500" />
               <netcdf

location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmin_A2.02010101-02201231.nc"
ncoords="7300" />
              </aggregation>
            </netcdf>
         </aggregation>
       </netcdf>
   </dataset>


I'm open to any suggestions or ideas!

thanks -
kevin


John Caron wrote:
Hi Kevin:

I havent had time to reproduce this yet, but im guessing one
source of the slowdown is using opendap URLS in the compound
aggregation. It would be interesting to time 1) the single
aggregations, 2) the compound agg as it exists, and 3) the
compound agg, but replace the opendap URLs with direct netcdf files,

see attached file