Due to the current gap in continued funding from the U.S. National Science Foundation (NSF), the NSF Unidata Program Center has temporarily paused most operations. See NSF Unidata Pause in Most Operations for details.
Hi John, I read now also David Robertsons mail and tried ancdump -v time 'http://vis-m8.met.no/thredds/dodsC/lustreMnt/heikok/FAUNA/mbr000_all.ncml'
to cache the coordinate values. Performance after that get's slightly better, from 20s to 17s for a $ time ncdump -h 'http://vis-m8.met.no/thredds/dodsC/lustreMnt/heikok/FAUNA/mbr000_all.ncml'
I've had now a closer look at the 'cache' entries as described by David. Even after a 'ncdump -v time...' only 100 files (of 4575) are cached. The thredds 4.3 cache had also only 100 files cached.
Questions are now:a) Why does 4.6 need all the coordinate values even if not requested? (Or why didn't 4.3 need those values?)
b) Is there a setting to increase the amount of cached aggregation files? (I followed more or less blindly http://www.unidata.ucar.edu/software/thredds/current/tds/reference/ThreddsConfigXMLFile.html, so settings with '100' are:
<NetcdfFileCache> <minFiles>100</minFiles> ... <TimePartition> <minFiles>100</minFiles> ... ) I attach my threddsConfig.xml and the mbr000_all.ncml file. Heiko On 2015-05-22 04:39, John Caron wrote:
ok, so looking closer, i see that there is a single file to cache the results, so .1 seconds is more likely. make sure all files have been cached, by requesting the time coordinate values in a dods request (see post earlier today). if thats the case, and subsequent accesses are still slow, send me the mbr000_all.ncml file. ps, i am gone until monday, so we can resume then... On Thu, May 21, 2015 at 6:58 PM, John Caron <caron@xxxxxxxx <mailto:caron@xxxxxxxx>> wrote: reading 4575 files in .1 seconds seems a bit too fast. Im guessing that the dataset is actually getting cached in memory, and you are seeing that performance. Then the question might be why isnt that happening as fast in 4.6? If you have "remote management" enabled, you can see what files are in the cache, and also clear the cache. http://www.unidata.ucar.edu/software/thredds/v5.0/tds/reference/RemoteManagement.html so, how long does it take in each version: 1) the first time that the aggregation is called, when theres nothing in the disk cache (eg lustre/mnt/heikok/FAUNA/mbr000_all.ncml) 2) the first time that the aggregation is called after the TDS starts up, when the disk cache is populated, but the dataset is not in memory (or it has been cleared from memroy) 3) how long it takes after its in memory. i believe that 2) could take 7 secs, and 3) takes .1 second, and maybe 1) takes 10-120 secs. its possible that for 4.6, 2) has slowed down to 20 secs, and maybe its not getting memory cached so 3) never happens. I will investigate that possibility. if you get a chance to experiment with checking/clearing memory cache with the 2 versions, let me know the results. John On Wed, May 20, 2015 at 10:41 AM, John Caron <caron@xxxxxxxx <mailto:caron@xxxxxxxx>> wrote: ok, we'll see if we can reproduce the problem. On Wed, May 20, 2015 at 10:25 AM, Heiko Klein <heiko.klein@xxxxxx <mailto:heiko.klein@xxxxxx>> wrote: Hi John, in 4.6.1, request-time stays at ~20s each time I try it. Only in 4.3.23 I see a huge perfomance-gain (from 7s to 0.1s) after the first fetch. Heiko ----- Original Message ----- > Hi Heiko: > > Can you see whether the second time you access the dataset, if the times > are fast again? > > thanks, > John > > On Wed, May 20, 2015 at 2:07 AM, Heiko Klein <Heiko.Klein@xxxxxx <mailto:Heiko.Klein@xxxxxx>> wrote: > > > Hi, > > > > I have some performance problems after upgrading to thredds 4.6.1. > > > > I'm aggregating a large dataset with a joinExisting aggregation. Reading > > the metadata from the aggregation took, with thredds 4.3.23, about 0.1s > > (first time up to 7s). After upgrading to 4.6.1, the same request takes 20s > > (second time, first time not measured) and is unusable slow. A 'ncview' of > > the aggregated dataset is no longer possible. > > > > Suspecting some caching problems, I followed the guidelines in > > http://www.unidata.ucar.edu/software/thredds/current/tds/reference/ThreddsConfigXMLFile.html > > The aggregation-cache contains two files > > > > $ ls -l file-lustre-mnt-heikok-FAUNA-mbr000_all.ncml > > lustre/mnt/heikok/FAUNA/mbr000_all.ncml > > -rw-r--r-- 1 tomcat7 tomcat7 74339 May 20 09:43 > > file-lustre-mnt-heikok-FAUNA-mbr000_all.ncml > > -rw-r--r-- 1 tomcat7 tomcat7 74131 May 20 10:01 > > lustre/mnt/heikok/FAUNA/mbr000_all.ncml > > > > (I guess the first one is thredds 4.3, while the second one is thredds 4.6) > > > > > > The ncml-file is: > > > > <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> > > <aggregation dimName="time" type="joinExisting"> > > <scan location="." regExp=".*snapMet000\.nc$" subdirs="true"/> > > </aggregation> > > </netcdf> > > > > > > It is aggregating 4575 files, each about 1.3G. > > > > > > The requests according to the logs are: > > 2015-05-20T09:49:17.111 +0200 [ 372240][ 281] INFO - > > threddsServlet - Remote host: 157.249.113.42 - Request: "GET > > /thredds/dodsC/lustreMnt/heikok/FAUNA/mbr000_all.ncml.dds HTTP/1.1" > > 2015-05-20T09:49:17.116 +0200 [ 372245][ 281] INFO - > > threddsServlet - Request Completed - 200 - -1 - 5 > > 2015-05-20T09:49:17.117 +0200 [ 372246][ 282] INFO - > > threddsServlet - Remote host: 157.249.113.42 - Request: "GET > > /thredds/dodsC/lustreMnt/heikok/FAUNA/mbr000_all.ncml.das HTTP/1.1" > > 2015-05-20T09:49:17.131 +0200 [ 372260][ 282] INFO - > > threddsServlet - Request Completed - 200 - -1 - 14 > > 2015-05-20T09:49:17.134 +0200 [ 372263][ 283] INFO - > > threddsServlet - Remote host: 157.249.113.42 - Request: "GET > > /thredds/dodsC/lustreMnt/heikok/FAUNA/mbr000_all.ncml.dds HTTP/1.1" > > 2015-05-20T09:49:17.138 +0200 [ 372267][ 283] INFO - > > threddsServlet - Request Completed - 200 - -1 - 4 > > > > > > Best regards, > > > > Heiko > > > > > > -- > > Dr. Heiko Klein Norwegian Meteorological Institute > > Tel. + 47 22 96 32 58 <tel:%2B%2047%2022%2096%2032%2058> P.O. Box 43 Blindern > > http://www.met.no 0313 Oslo NORWAY > > > > _______________________________________________ > > thredds mailing list > > thredds@xxxxxxxxxxxxxxxx <mailto:thredds@xxxxxxxxxxxxxxxx> > > For list information or to unsubscribe, visit: > > http://www.unidata.ucar.edu/mailing_lists/ > > >
<?xml version="1.0" encoding="UTF-8"?> <threddsConfig> <!-- all options are commented out in standard install - meaning use default values --> <!-- see http://www.unidata.ucar.edu/projects/THREDDS/tech/reference/ThreddsConfigXMLFile.html --> <serverInformation> <name>Met.no Internal Thredds</name> <logoUrl>http://thredds.met.no/metepos.gif</logoUrl> <logoAltText>met.no Thredds</logoAltText> <abstract>Scientific Data</abstract> <keywords>meteorology, atmosphere, climate, ocean, earth science</keywords> <contact> <name>met.no</name> <organization>Norwegian Meteorological institute</organization> <email>nicob@xxxxxx</email> <!--phone></phone--> </contact> <hostInstitution> <name>met.no</name> <webSite>http://www.met.no/</webSite> <logoUrl>http://thredds.met.no/metepos.gif</logoUrl> <logoAltText>met.no</logoAltText> </hostInstitution> </serverInformation> <!-- The <catalogRoot> element: For catalogs you don't want visible from the /thredds/catalog.xml chain of catalogs, you can use catalogRoot elements. Each catalog root config catalog is crawled and used in configuring the TDS. <catalogRoot>myExtraCatalog.xml</catalogRoot> <catalogRoot>myOtherExtraCatalog.xml</catalogRoot> --> <!-- * Setup for generated HTML pages. * * NOTE: URLs may be absolute or relative, relative URLs must be relative * to the webapp URL, i.e., http://server:port/thredds/. --> <htmlSetup> <!-- * CSS documents used in generated HTML pages. * The CSS document given in the "catalogCssUrl" element is used for all pages * that are HTML catalog views. The CSS document given in the "standardCssUrl" * element is used in all other generated HTML pages. * --> <standardCssUrl>tds.css</standardCssUrl> <catalogCssUrl>tdsCat.css</catalogCssUrl> <openDapCssUrl>tdsDap.css</openDapCssUrl> <!-- * The Google Analytics Tracking code you would like to use for the * webpages associated with THREDDS. This will not track WMS or DAP * requests for data, only browsing the catalog. --> <googleTrackingCode></googleTrackingCode> </htmlSetup> <!-- The <CatalogServices> element: - Services on local TDS served catalogs are always on. - Services on remote catalogs are set with the allowRemote element below. They are off by default (recommended). --> <CatalogServices> <allowRemote>false</allowRemote> </CatalogServices> <!-- Configuring the CDM (netcdf-java library) see http://www.unidata.ucar.edu/software/netcdf-java/reference/RuntimeLoading.html <nj22Config> <ioServiceProvider class="edu.univ.ny.stuff.FooFiles"/> <coordSysBuilder convention="foo" class="test.Foo"/> <coordTransBuilder name="atmos_ln_sigma_coordinates" type="vertical" class="my.stuff.atmosSigmaLog"/> <typedDatasetFactory datatype="Point" class="gov.noaa.obscure.file.Flabulate"/> </nj22Config> --> <!-- CDM uses the DiskCache directory to store temporary files, like uncompressed files. --> <DiskCache> <alwaysUse>false</alwaysUse> <scour>1 hour</scour> <maxSize>50 Gb</maxSize> </DiskCache> <!-- Caching open NetcdfFile objects. default is to allow 50 - 100 open files, cleanup every 11 minutes --> <NetcdfFileCache> <minFiles>100</minFiles> <maxFiles>150</maxFiles> <scour>12 min</scour> </NetcdfFileCache> <TimePartition> <minFiles>100</minFiles> <maxFiles>150</maxFiles> <scour>13 min</scour> </TimePartition> <!-- The <HTTPFileCache> element: allow 10 - 20 open datasets, cleanup every 17 minutes used by HTTP Range requests. --> <HTTPFileCache> <minFiles>10</minFiles> <maxFiles>20</maxFiles> <scour>17 min</scour> </HTTPFileCache> <RandomAccessFile> <minFiles>400</minFiles> <maxFiles>500</maxFiles> <scour>11 min</scour> </RandomAccessFile> <!-- Writing GRIB indexes. --> <GribIndexing> <setExtendIndex>false</setExtendIndex> <alwaysUseCache>false</alwaysUseCache> </GribIndexing> <!-- Persist joinNew aggregations to named directory. scour every 24 hours, delete stuff older than 90 days --> <AggregationCache> <scour>24 hours</scour> <maxAge>90 days</maxAge> </AggregationCache> <!-- How to choose the template dataset for an aggregation. latest, random, or penultimate --> <Aggregation> <typicalDataset>penultimate</typicalDataset> </Aggregation> <!-- The Netcdf Subset Service is off by default. --> <NetcdfSubsetService> <allow>true</allow> <scour>10 min</scour> <maxAge>-1 min</maxAge> </NetcdfSubsetService> <Opendap> <ascLimit>50</ascLimit> <binLimit>5000</binLimit> <serverVersion>opendap/3.7</serverVersion> </Opendap> <!-- The WCS Service is off by default. Also, off by default (and encouraged) is operating on a remote dataset. <WCS> <allow>false</allow> <allowRemote>false</allowRemote> <scour>15 min</scour> <maxAge>30 min</maxAge> </WCS> --> <WMS> <allow>true</allow> <allowRemote>false</allowRemote> <maxImageWidth>2048</maxImageWidth> <maxImageHeight>2048</maxImageHeight> <paletteLocationDir>palettes</paletteLocationDir> <ogcMetaXML>OGCMeta.xml</ogcMetaXML> <scour>5 min</scour> <maxAge>60 min</maxAge> </WMS> <!-- <NCISO> <ncmlAllow>false</ncmlAllow> <uddcAllow>false</uddcAllow> <isoAllow>false</isoAllow> </NCISO> --> <!-- CatalogGen service is off by default. <CatalogGen> <allow>false</allow> </CatalogGen> --> <!-- DLwriter service is off by default. As is support for operating on remote catalogs. <DLwriter> <allow>false</allow> <allowRemote>false</allowRemote> </DLwriter> --> <!-- DqcService is off by default. <DqcService> <allow>false</allow> </DqcService> --> <!-- Link to a Viewer application on the HTML page: <Viewer>my.package.MyViewer</Viewer> --> <!-- Add a DataSource - essentially an IOSP with access to Servlet request parameters <datasetSource>my.package.DatsetSourceImpl</datasetSource> --> </threddsConfig>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <aggregation dimName="time" type="joinExisting"> <scan location="." regExp=".*snapMet000\.nc$" subdirs="true"/> </aggregation> </netcdf>
thredds
archives: