[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20021218: McIDAS on weather.admin.niu.edu (cont.)



>From: Gilbert Sebenste <address@hidden>
>Organization: NIU
>Keywords: 200212131543.gBDFhG410430 McIDAS

Gilbert,

re: big users on weather

>> rad
>> X
>> nautalis
>> rhn-applet-gui
>> gnome-panel

>Yep, I see that too. However, I was wondering if there was something 
>discreet going on that I couldn't see or didn't know how to see.

What I saw was a mystery since the CPU load could not be accounted for
by adding up the %CPU column in a top listing.

re: installation of McIDAS if it is obviously causing problems

>OK.

re: CPU use went down after a HUP to xinetd

>It's back up now, and something is chewing up memory. I am now using swap 
>space.

Two things:

o I have top running continuously.  As I write this, I see
  that the CPU load is less than 2 and was as low as 1.0.  Also, I
  havn't seen the load average go up past 3.6 since the HUP to xinetd.

o we turn off/don't run nautalis here in Unidata.  We found that it did
  some very strang things (like chew up CPU) and would eventually core dump.
  Plus, it is using logs of memory:

  979 ldm       15   0 14768  14M  9148 S     0.0  1.4   0:06 nautilus

As a comparison, the McIDAS-XCD routines are all using less than 1800 KB
of RAM, and their CPU use is next to nothing.

>Could it be the X window that is causing this then?

Yes, it could be.  The times I have seen the load average ramp up is
when rad runs, and a big increase in X is usually seen at the same
time.  Are you displaying radar sectors created by rad by any chance?

re: ldm connects/disconnects from weather.cod.edu

>Actually, that's Dave Bukowski at the College of DuPage making sure my LDM 
>is up and running. When it barfs, and if he is up late at night, he will 
>log in and fix it for me if he can. That hasn't happened since he started 
>doing this. He does an ldmping (I think) every few minutes just to make 
>sure it is alive.

Whatever he is doing, he seems to be doing it excessively:

Dec 18 16:00:07 weather weather[1316]: Connection from weather.cod.edu 
Dec 18 16:00:07 weather weather[1316]: Connection reset by peer 
Dec 18 16:00:07 weather weather[1316]: Exiting 
Dec 18 16:00:08 weather weather[1321]: Connection from weather.cod.edu 
Dec 18 16:00:08 weather weather[1321]: Connection reset by peer 
Dec 18 16:00:08 weather weather[1321]: Exiting 
Dec 18 16:00:08 weather weather[1322]: Connection from weather.cod.edu 
Dec 18 16:00:11 weather weather[1322]: Connection reset by peer 
Dec 18 16:00:11 weather weather[1335]: Connection from weather.cod.edu 
Dec 18 16:00:12 weather weather[1322]: Exiting 
Dec 18 16:00:15 weather weather[1335]: Connection reset by peer 
Dec 18 16:00:15 weather weather[1350]: Connection from weather.cod.edu 
Dec 18 16:00:17 weather weather[1335]: Exiting 

The connections/disconnections are happening as often as several times
a second to every few seconds -- this is overkill.  Checking once every
10-15 minutes should be more than adequate.  Depending on how he is
checking on your LDM, the effect on your machine could be dramatic.

Each time a downstream LDM process connects to a server for data, the
rpc.ldmd responding to the request on the server has to search through
the queue to find the data that should/may have to be sent.  This gets
to be more and more of an expensive operation as the size of the queue
increases.  An ldmping doesn't request data, but it still causes
an rpc.ldmd to be started on the object of the ping.  Since the
connections from COD are every second or so, it means that your machine
has to start a new invocation of rpc.ldmd every second or so, and this
is overkill.

Tom