[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20021213: McIDAS on weather.admin.niu.edu



>From: Gilbert Sebenste <address@hidden>
>Organization: NIU
>Keywords: 200212131543.gBDFhG410430 McIDAS

Gilbert,

>Is it possible that you could put the McIDAS memory leak patches onto 
>weather.admin? My machine has gotten very slow in the last severla 
>months...and I'm wondering if that is the problem.

I logged onto weather today and don't see that the McIDAS-XCD data
monitors are using up excessive amounts of either memory or CPU.  In
fact, the processes that seem to be the big hogs are:

rad
X
nautalis
rhn-applet-gui
gnome-panel

>And yes, go ahead and 
>put the latest version of McIDAS on there to match weather2, if need be, 
>and if you have time.

If I can see a clear indication that McIDAS has something to do with
the problems you are seeing, I will do the upgrade.

>From address@hidden Wed Dec 18 09:37:44 2002

>I throw my hands up. I do not know what is causing this. On 
>weather.admin.niu.edu, I'm running RH 8.0, Gcc 3.2-11, latest version of 
>glibc. The machine has 1 GB of memory, but after an hour or less after 
>rebooting, it is full and then starts using disk swap all the time, 
>causing the machine to bog down severely.

We (me and our system administrator, Mike Schmidt) observed this after
logging on today.  A quick look (using top) to see what the big memory
users are shows:

  2:27pm  up  2:31,  3 users,  load average: 8.21, 6.22, 5.10
128 processes: 123 sleeping, 5 running, 0 zombie, 0 stopped
CPU0 states: 64.1% user, 19.4% system,  0.0% nice, 15.3% idle
CPU1 states: 58.0% user, 13.4% system,  0.0% nice, 27.4% idle
Mem:  1030548K av, 1014280K used,   16268K free,       0K shrd,   20896K buff
Swap:  256996K av,     580K used,  256416K free                  799736K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
16247 ldm       15   0 17584  17M   628 S     5.3  1.7   0:41 pqact
  979 ldm       15   0 15528  15M  9908 S     0.0  1.5   0:04 nautilus
  985 ldm       15   0 13748  13M  9364 S     0.0  1.3   0:13 rhn-applet-gui
  977 ldm       16   0 13452  13M  8164 S     3.5  1.3   1:17 gnome-panel
  995 ldm       15   0  8608 8604  6744 S     0.1  0.8   0:18 gnome-terminal
  947 ldm       15   0  8416 8412  6512 S     0.0  0.8   0:00 gnome-session
  960 ldm       15   0  7188 7184  5788 S     0.0  0.6   0:01 gnome-settings-
  958 ldm       15   0  6232 6232  5128 S     0.5  0.6   0:18 metacity
  954 ldm       15   0  4980 4980  1984 S     0.0  0.4   0:00 gconfd-2
  983 ldm       15   0  3996 3996  3440 S     0.0  0.3   0:00 pam-panel-icon
  956 ldm       15   0  2244 2244  1840 S     0.0  0.2   0:00 bonobo-activati
12696 ldm       15   0  1912 1912  1036 S     0.0  0.1   0:00 tcsh
  974 ldm       15   0  1712 1712  1408 S     0.0  0.1   0:02 xscreensaver
  996 ldm       15   0  1632 1632   812 S     0.0  0.1   0:00 tcsh
16263 ldm       15   0  1616 1616   712 D     1.1  0.1   0:51 dmsfc.k
20188 ldm       25   0  1540 1540  1268 R    49.0  0.1   0:02 rad
16244 ldm       15   0  1392 1392  1300 S     7.2  0.1   0:53 rpc.ldmd
  964 ldm       15   0  1364 1364  1064 S     0.0  0.1   0:00 fam
16266 ldm       15   0  1044 1044   744 S     0.0  0.1   0:00 dmmisc.k
18744 ldm       15   0  1012 1012   752 R     1.7  0.0   0:10 top
16264 ldm       15   0   988  988   700 S     0.0  0.0   0:00 dmraob.k
  948 ldm       15   0   988  968   792 S     0.0  0.0   0:00 ssh-agent
20043 ldm       18   0   964  964   816 S     0.7  0.0   0:00 dtnradscript
20145 ldm       17   0   952  952   816 S     0.0  0.0   0:00 doppler.srmv1
  888 ldm       18   0   936  936   644 S     0.0  0.0   0:00 tcsh
16246 ldm       19   0   928  924   860 S     0.0  0.0   0:00 startxcd.k
16280 ldm       16   0   920  916   844 S     0.0  0.0   0:00 ingetext.k
  918 ldm       15   0   868  868   748 S     0.0  0.0   0:00 imwheel
16265 ldm       15   0   828  828   668 S     0.0  0.0   0:00 dmsyn.k
16249 ldm       15   0   732  732   608 S     0.3  0.0   0:04 pqact
16245 ldm       15   0   728  728   644 S     1.9  0.0   0:12 pqbinstats
16248 ldm       15   0   668  668   588 S     0.5  0.0   0:10 pqsurf

As you can see from this list, the McIDAS-XCD processes (dmsfc.k, dmsyn.k,
dmraob.k, dmmisc.k, ingebin.k, ingetext.k, and startxcd.k) are not
using much memory or CPU.  So, the answer to the mystery lies elsewhere.

Just by chance, I did a netstat to see what machines were connected
to weather, and Mike noticed that the entry for the LDM connections
were showing the LDM port, not the mnemonic (i.e., 388 vs ldm).  This
told us that you did not have the requisite LDM entries in your
/etc/services file.  A quick look verified this.

OK, so hold on to your butt...

When I went to add the /etc/services entries for the LDM, the load average
on weather was right around 5.  I added the entries (as per web page
instructions for setting up an LDM):

<all done as 'root'>

# Local services
ldm             388/tcp
ldm             388/udp

and then sent a HUP signal to xinetd:

% ps -eaf | grep xinetd
root       554     1  0 11:56 ?        00:00:00 xinetd -stayalive -reuse -pidfil
ldm      28663 12696  0 15:29 pts/2    00:00:00 grep -i xinetd

% kill -HUP 554

I then exited out as 'root' back to 'ldm' and ran top again.  To my
amazement, the load average had dropped to less than 2!  Why the load
average dropped is a bit of a mystery.  Either something in your system
needed the /etc/services entries for the LDM, or the HUP to xinetd
freed up some system resource.

After watching things for awhile, I see that the CPU use as reported by
top continues to be relatively low (except when rad or X kicks in).

>I think part of it is McIDAS 
>memory leaks which Tom Yoksas has been working on,

I disagree.  The McIDAS-XCD processes stayed reasonably small, and their
CPU use is low.

>but I can't account for 
>ALL of it. Can anyone venture a guess? I am running WXP and McIDAS, as 
>well as the latest version of apache, all patched, on weather. The result: 
>load average sticks between 5 and 10, data is missed all over the place, 
>and I have no clue why.

So, something in RedHat 8 is causing the CPU load to sky rocket when
either xinetd gets bogged down for some reason, or the LDM entries
in /etc/services are vital.

By the way, we see a lot of connect/disconnects from weather.cod.edu
in your ~ldm/ldmd.log file.  This sould be looked into.

Tom