[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20050419: some modifications on bigbird



>From: Unidata User Support <address@hidden>
>Organization: Unidata Program Center/UCAR
>Keywords: IDD ingest decode relay

Hi Gerry,

After seeing the continuous high load averages on bigbird this morning,
I decided to see if I could help mitigate and better monitor the situation:

monitoring:

- added my id_dsa.pub key to the ~/.ssh/authorized_keys file so I can
  snarf the last line in the ~ldm/logs/bigbird.uptime log file.  This
  information is used (by me) to keep tabs on performance of an increasing
  number of machines around the IDD.  Here is some sample output of
  the script's execution:

node     CCYYMMDD.HHMM   1min  5min 15min feed ing tot    age mfree swpused
--------+-------------+------+-----+-----+----+---+---+------+-----+-------
uni1     20050419.2304   0.36  0.30  0.42    1   2   3   9080    8M   52M 
uni2     20050419.2306   5.39  6.76  6.96   69   2  71   8852    9M   83M 
uni3     20050419.2305   0.74  1.79  2.71   77   2  79   8203   15M   77M 
uni4     20050419.2306   0.29  0.91  0.88   44   2  46   8862   15M   76M 
thelma   20050419.2304   0.07  0.09  0.08    1   2   3   4184   20M   80M 
jackie   20050419.2306   0.23  0.29  0.22    2   4   6   5050   11M   46M 
desi     20050419.2306   0.12  0.17  0.20    2   4   6   2476  346M 1664M
samoon   20050419.2303   0.24  0.18  0.12    1   4   5   1188    1M  327M 
igor     20050419.2306   0.02  0.07  0.07    1   4   5   5049   12M   38M
emo      20050419.2306   0.14  0.26  0.37    6  12  18   2403 1557M 1024M
oliver   20050419.2306   0.08  0.15  0.16    5  10  15   4278   14M   97M 
mother   20050419.2306   4.92  4.96  5.18    4  15  19   2427  543M 1923M
atm      20050419.2306   2.43  2.67  2.73   69  10  79   4103  918M  445M
unidata2 20050419.2306   0.47  0.65  1.12   38   5  43   2049 1294M  245M
papagayo 20050419.2306   1.68  1.73  1.88   37  14  51   3120  740M  694M
bigbird  20050419.2306   7.28 10.70 12.02   32  14  46   1636    5M   24K


mitigation:

- seeing that bigbird was struggling with the ingestion, decoding, and
  relaying functions that it was performing, and knowing that you are
  not actively using the files being decoded into McIDAS format using
  McIDAS-XCD decoders, I decided to stop running the decoders to see
  if that would help

- I then cleaned up orphaned shared memory segments left over from
  incorrectly exited McIDAS processes -- this freed up some, but not
  a lot of, memory

- I killed an invocation of scourByDay.tcl since it had been racking
  up lots of CPU time

After making these changes, the load average on bigbird dropped from
the 19-25 range back down to the 5-17 range.  How much of the drop has
been due to catching up on CONDUIT, CRAFT, and NNEXRAD processing is
unknown, but I have the sneaking suspicion that killing off the wayward
scourBYday.tcl script helped a bunch.  It also didn't hurt turning off
the XCD processes since they were using I/O and I/O was the limiting
factor in the slowdown.

Mike Schmidt and I were discussing the sluggishness of bigbird, and we
agreed that you might well benefit from an upgrade to Fedora Core 3.

Just wanted to let you know...

Cheers,

Tom