The top and uptime utilities can be used to monitor the system CPU load:
top ... uptime ...
The top and iostat utilities can be used to monitor the system I/O load:
top ... iostat ...
If your LDM is a node on the IDD and is a gateway LDM, then the most convenient way to monitor data-product latency is to go to the IDD rtstats webpages, select your computer, find the feedtype in which you're interested, and then select the latency plot.
The ldmadmin utility can be used to monitor the data-product latency of incoming data:
The output is in the formldmadmin watch (Type ^D or ^C when finished) ...
MMM DD hh:mm:ss pqutil: nbytes YYYYMMDDhhmmss.sss ft seqno pidwhere:
By comparing the two timestamp fields, one can get an idea of the data-product latency.
- MMM DD hh:mm:ss
- is the month, day, hour, minute, and second when the line was printed.
- is the size of the data-product in bytes.
- is the data-product creation-time of the data-product.
- is the feedtype of the data-product.
- is the sequence number of the data-product (and usually unimportant).
- is the data-product identifier of the data-product.
The pqmon utility can be used to monitor the product-queue. For example
The above shows 70301 data-product slots that each refer to a data-product (nprods); 12 slots refer to gaps (i.e., contiguous empty space) in the product-queue (nfree); and 417968 slots that refer to nothing at all (nempty). The total number of slots is 488281 (nprods + nfree + nempty). The maximum number of slots that refer to data-products since the product-queue was created is 234870 (maxprods). Similarly, the maximum number of slots that reference a gap is 1293 (maxfree) and the minimum number of empty slots is 253410 (minempty). The size of the largest gap currently in the product-queue is 92040 bytes (maxext) and the age of the oldest data-product in the queue is 2867 seconds (age). Because this product-queue is known to have been active for quite some time (several months) the large number of empty slots means that it was created with an unnecessarily large parameter specifying the maximum number of data-products. The overhead of managing the queue could be slightly reduced by recreating the queue with a smaller number of slots (e.g., 250000).pqmon Oct 27 16:48:28 pqmon: Starting Up (19969) Oct 27 16:48:28 pqmon: nprods nfree nempty nbytes maxprods maxfree minempty maxext age Oct 27 16:48:28 pqmon: 70301 12 417968 1999781144 234870 1293 253410 92040 2867 Oct 27 16:48:28 pqmon: Exiting
The ldmping utility can be used to determine the availability of an upstream LDM. For example
where:ldmping -i 0 hostname MMM DD hh:mm:ss State Elapsed Port Remote_Host rpc_stat MMM DD hh:mm:ss state time port hostname rpcMsg
If the state of the upstream LDM is anything other than RESPONDING, then an LDM on the computer on which the ldmping utility was executed will not be able to receive any data-products.
- MMM DD hh:mm:ss
- is the current month, day, hour, minute, and second.
- is the state of the upstream LDM:
- The hostname couldn't be converted into an IP address.
- An LDM is not running on the upstream host on the expected port (both port 388 and the upstream host's portmapper will have been tried).
- An LDM is running on the upstream host on the expected port but we're not allowed to connect to it (i.e., there's no ALLOW entry for our LDM in the configuration-file of the upstream LDM).
- An LDM is running on the upstream host on the expected port and we're allowed to connect to it.
- is the amount of elapsed time.
- is the port number. This is only valid if the state is RESPONDING.
- is the name of the upstream host.
- is the associated message (if any) from the RPC layer.
If an ldmping to the upstream LDM shows no problems, then the notifyme utility can be used to determine what an downstream LDM connecting to the upstream LDM should receive:
notifyme -vl- -h hostname
You can monitor a downstream LDM process that is executing on the local system by setting its logging-level to verbose, at which time it will print the data-product metadata of every data-product that it receives to the LDM logfile. The logging-level of a downstream LDM process can be changed by sending it a SIGUSR2 signal via the kill utility, e.g.,
where pid is the process-ID of the downstream LDM process, which can be discovered by searching the LDM logfiles for the relevant "Starting Up" message, e.g.,kill -s USR2 pid
cd $HOME/logs grep -Fi 'Starting Up' `ls -rt ldmd.log*` ...
You can monitor an upstream LDM process that is executing on the local system by setting its logging-level to debug, at which time it will print the data-product metadata of every data-product that it sends to the LDM logfile (along with other debugging information). The logging-level of an upstream LDM process can be changed by sending it a SIGUSR2 signal via the kill utility, e.g.,
where pid is the process-ID of the upstream LDM process, which can be most easily discovered via the uldbutil utility:kill -s USR2 pid
or, less conveniently, by searching the LDM logfiles for the relevant "Starting Up" message:uldbutil ...
cd $HOME/logs grep -Fi 'Starting Up' `ls -rt ldmd.log*` ...
The non-standard utility netstat(1) is, nevertheless, available on many UNIX� platforms and can be used to show the state of network connections. For example, here is the output of a netstat(1) command on a computer at the Unidata Program Center that's running FreeBSD:
This output assumes that the string "ldm" was associated with port 388 during the LDM Preinstallation Steps. You might have to adjust the above command to suit your operating system.netstat -a -f inet -p tcp | awk 'NR<=2 || /ldm/' Active Internet connections (including servers) Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp4 0 0 emo.4392 flip.ldm TIME_WAIT tcp4 0 0 emo.ldm storm.sjsu.edu.39415 TIME_WAIT tcp4 0 0 emo.ldm storm.sjsu.edu.39413 TIME_WAIT tcp4 0 33304 emo.ldm storm.sjsu.edu.36000 ESTABLISHED tcp4 0 33304 emo.ldm storm.sjsu.edu.35999 ESTABLISHED tcp4 0 44 emo.ldm storm.sjsu.edu.35998 ESTABLISHED tcp4 0 928 emo.ldm storm.sjsu.edu.35997 ESTABLISHED tcp4 0 33304 emo.ldm storm.sjsu.edu.35996 ESTABLISHED tcp4 0 33304 emo.ldm storm.sjsu.edu.35995 ESTABLISHED tcp4 0 33304 emo.ldm storm.sjsu.edu.34569 ESTABLISHED tcp4 0 5828 emo.ldm storm.sjsu.edu.34562 ESTABLISHED tcp4 0 44 emo.ldm storm.sjsu.edu.34560 ESTABLISHED tcp4 0 23900 emo.ldm storm.sjsu.edu.34561 ESTABLISHED tcp4 0 22240 emo.ldm solon.meteoro.uf.2861 ESTABLISHED tcp4 0 44 emo.ldm bigbird.tamhsc.e.50860 ESTABLISHED tcp4 0 0 emo.ldm bigbird.tamhsc.e.50858 ESTABLISHED tcp4 0 8640 emo.ldm bigbird.tamhsc.e.50857 ESTABLISHED tcp4 0 0 emo.ldm bigbird.tamhsc.e.50856 ESTABLISHED tcp4 0 44 emo.ldm bigbird.tamhsc.e.50855 ESTABLISHED tcp4 0 44 emo.ldm bigbird.tamhsc.e.50854 ESTABLISHED tcp4 0 0 emo.1066 desi.ldm ESTABLISHED tcp4 0 28 emo.1065 jackie.ldm ESTABLISHED tcp4 0 0 emo.1064 thelma.ucar.edu.ldm ESTABLISHED tcp4 0 0 emo.1063 thelma.ucar.edu.ldm ESTABLISHED tcp4 0 28 emo.1062 shemp.ldm ESTABLISHED tcp4 0 0 emo.1061 desi.ldm ESTABLISHED tcp4 0 0 emo.1060 jackie.ldm ESTABLISHED tcp4 0 0 *.ldm *.* LISTEN
The last line of the output shows the top-level ldmd listening for TCP connections on port "ldm" (alias 388). The output also shows nineteen regular upstream LDMs (identified by having "emo.ldm" as the local address). The connections of two of these upstream LDMs are in TIME_WAIT and the associated processes should terminate soon. Seven active downstream LDMs are also shown near the end of the output. Lastly, the connection whose local address is "emo.4392" is special. This connection is due to the rtstats(1) process sending statistics to the Unidata Program Center computer "rtstats.unidata.ucar.edu" (alias "flip").
If your computer system has the top, netstat, uptime, and vmstat utilities installed, and you have configured the ldmadmin configuration-file correctly, then you can periodically accumulate LDM performance metrics in a file for subsequent display and analysis by executing the addmetrics command of the ldmadmin utility from a crontab entry. See Edit the LDM user's crontab(1) file.
Additionally, if your computer system has the gnuplot utility installed, then you can plot the LDM performance metrics by executing the plotmetrics command of the ldmadmin utility.
The LDM logfile is your friend. If you encounter a problem, then one of the first things you should do is to look at it. Problems can often be diagnosed by comparing corresponding logfile entries from the upstream LDM and the downstream LDM.