[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030911: LDM question (cont.)



>From: "Mark J. Laufersweiler" <address@hidden>
>Organization: OU
>Keywords: 200309111605.h8BG5kLd002485 LDM rpc.ldmd split feed

Mark,

>No CONDUIT data yet. Wanting to do that is the big reason for the
>upgrades in hardware and part of the reason for determining loads.

OK, makes sense.

>We have two new dual proc AMD Opterons with a ide2scsi raid. One is
>for the standard feeds and the second is for radar. Since we need to
>decode all nids for all radars and want to also bring in the level2
>data, we figured a seperate machine would be nice.

Another good idea.

>Right now:
>
> netstat | grep stokes.unidata-ldm | wc
>      28     168    2212
>
>with
>
>ps aux | grep ldm | wc
>      56     715    4571
>
>where 56 ranges between 49-60 usually, depending on the time, etc
>etc.

We instrument our machines with a Tcl script that farms a number bits
of information and saves it into a log file.  Here is an annotated snippit
form the log file on thelma.ucar.edu:


   1      2     3     4     5      6   7   8     9    10    11    12  13
20030911.1736  13.75  8.72  7.40  125  23 148   9610 4560M 461M    1   1
20030911.1737   7.55  8.00  7.23  125  23 148   9670 4560M 461M    1   1
20030911.1738   8.63  8.18  7.35  125  23 148   9732 4560M 461M    1   1
20030911.1739  13.47  9.28  7.77  125  23 148   9572 4559M 462M    1   1
20030911.1740  13.90 10.06  8.14  125  23 148   9633 4560M 461M    1   1
20030911.1741  20.14 12.18  9.01  125  23 148   9692 4560M 461M    0   1

Field   Meaning
1       CCYYMMDD - date
2       HHMM     - time (UTC)
3       ave1     - 1 minute load average
4       ave5     - 5 minute load average
5       ave15    - 15 minute load average
6       nfeed    - number of downstream feed rpc.ldmds
7       nreceive - number of upstream request rpc.ldmds
8       nconnect - total number of rpc.ldmds
9       nsec     - age of oldest product in queue
10      memfree  - amount of free memory
11      swapused - swap space used
12      #wait    - number of process in WAIT state
13      #rtstats - number of connections to rtstats.unidata.ucar.edu

The script is run out of cron every minute, so it provides a nice
history of the performance of the machine.  Would you like to run the
same script on your machine(s)?  The script has to be tweeked for
the OS it runs on, but we have a version we are running on our FreeBSD
LDM box.  If you are interested, I put the script in the pub/ldm/scripts/freebsd
directory of anonymous FTP on our FTP server, ftp.unidata.ucar.edu.
You will have to find out where on your system tclsh exists (if it does)
and alter the first line of the script.  The crontab entry we have
for running the script on our FreeBSD machine is:

#
# Log the system usage minute by minute
* * * * * util/uptime.tcl logs/newshemp.uptime

I would change this to:

#
# Log the system usage minute by minute
* * * * * util/uptime.tcl logs/stokes.uptime

for stokes.

>But with the desire to decode CONDUIT and the load that will come
>from that, we need to decide which machine will handle the
>processing.

Right.

>As a side, we will build one machine with Redhat9 and the second
>with FreeBSD. SMP does not always work with FreeBSD, but FreeBSD
>seems committed to AMD chipsets. We will see.

We are running FreeBSD 4.8 on a dual Athlon 2400+ machine with an SMP
kernel, and it seems to be performing well while ingesting all
CRAFT and CONDUIT data (no relay operation, however).

Tom