Re: [thredds] TDS performance issues on a production server

To: Jay Alder <alderj@xxxxxxxxxxxxxxxxxxxx>, thredds@xxxxxxxxxxxxxxxx
Subject: Re: [thredds] TDS performance issues on a production server
From: Heiko Klein <Heiko.Klein@xxxxxx>
Date: Fri, 13 Dec 2013 09:12:44 +0100

Hi Jay,

not sure if this is connected, but we've had similar problems withncWMS/thredds some years agohttp://www.unidata.ucar.edu/mailing_lists/archives/thredds/2010/msg00069.html

WMS-clients will request many maps/tiles at once and this will give highserver load if not cached. This might lead to canceled client requests.In addition, tomcat6 with ncWMS had a strange bug with requestscancelled by the client, leading to a server-crash at the end.

We solved this by adding a apache 'mod-cache' in front of tomcat andmade tomcat deliver all pages with cache-headers keeping pictures for 7days in the cache. Server-load dropped nicely due to the cache, andtomcat doesn't get any more 'client abort exceptions' since those areswollowed by the cache.


Heiko

On 2013-12-11 23:33, Jay Alder wrote:

Hi, we’ve recently released a web application that uses TDS for mapping,
which is getting a lot of traffic. At one point the server stopped
responding altogether, which is a major problem. A quick restart of
tomcat got it going again, so I’m starting to dig into the logs. We
normally get the GET / request complete behavior, but occasionally we’ll
have:

GET …url…
GET …url…
GET …url…
GET …url…
GET …url…
GET …url…
GET …url…
GET …url…

meanwhile having a 100% CPU spike (with 12 CPUs) for a minute or more

request compete
request compete
request compete
request cancelled by client
request cancelled by client
request compete
request compete

While watching the logs the few times I’ve seen this occur it seems to
pull out of it ok. However the time the server failed, requests were
never returned. From the logs, requests came in for roughly 40 minutes
without being completed. Unfortunately do to the high visibility we
started to get emails from users and the press about the application no
longer working.

Has anyone experienced this before and/or can you give guidance on how
to diagnose or prevent this?

Here are some config settings:
CentOS 5.7
Java 1.6
TDS 4.3.17
only WMS is enabled
Java -Xmx set to 8Gb (currently taking 5.3, the dataset is 600 Gb of
30-arcsecond grids for the continental US, 3.4 Gb per file)
For better or worse we are configured to use 2 instances of TDS to keep
the catalogs and configuration isolated. I’m not sure if this matters,
but I didn’t want to omit it. Since it is a live server I can’t easily
change to the preferred proxy configuration.

I am trying not to panic yet. However, if the server goes unresponsive
again, staying calm may no longer be an option.

Jay Alder
US Geological Survey
Oregon State University
104 COAS Admin Building
Office Burt Hall 166
http://ceoas.oregonstate.edu/profile/alder/



_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/


--
Dr. Heiko Klein                              Tel. + 47 22 96 32 58
Development Section / IT Department          Fax. + 47 22 69 63 55
Norwegian Meteorological Institute           http://www.met.no
P.O. Box 43 Blindern  0313 Oslo NORWAY

Follow-Ups:
- Re: [thredds] TDS performance issues on a production server
  - From: Jay Alder
- Re: [thredds] TDS performance issues on a production server
  - From: Gerry Creager - NOAA Affiliate

References:
- [thredds] TDS performance issues on a production server
  - From: Jay Alder

2013 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: