[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IDD #TMZ-743033]: Help? Our LDM Miami radar latency is too long



Hi Brian and Brian :-)

I just logged on weather.rsmas.miami.edu as user 'ldm' to see what
might be up, and I immediately noticed that the load average was
above 11.  After seeing that the GEMPAK dctaf decoder was using
an excessive amount of CPU, I commented out its running in the
GEMPAK pattern-action file (~ldm/etc/GEMPAK/pqact.gempak_decoders)
and killed the process that appeared to be in a very tight loop.

A few minutes later, the load average has come down to more reasonable
levels:

/usr/local/ldm/etc/GEMPAK% uptime
 13:43:42 up 275 days,  7:55,  6 users,  load average: 7.29, 8.34, 9.39

but it is still rather high.

I then noted that several 'rsync's being run by bmapes are using quite a bit
of CPU and presumably disk I/O.

Following up on Steve's use of 'notifyme' to check on the actual receipt
latency for pieces of KAMX volume scans, I ran the same on weather and
see that the products are being received on weather without big delays:

<ldm on weather.rsmas.miami.edu>

20180301T183651.419642Z notifyme[129548] NOTE notifyme.c:385:main() Starting 
Up: weather.rsmas.miami.edu: 20180301183651.419444 TS_ENDT {{NEXRAD2, "KAMX"}}
20180301T183651.419677Z notifyme[129548] NOTE ldm5_clnt.c:460:forn5() LDM-5 
desired product-class: 20180301183651.419444 TS_ENDT {{NEXRAD2, "KAMX"}}
20180301T183651.443769Z notifyme[129548] INFO error.c:236:err_log() Resolving 
weather.rsmas.miami.edu to 129.171.98.193 took 0.02402 seconds
20180301T183651.631746Z notifyme[129548] NOTE ldm5_clnt.c:294:forn_signon() 
NOTIFYME(weather.rsmas.miami.edu): OK
20180301T183702.135665Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
  179183 20180301183701.053869 NEXRAD2 531007  
L2-BZIP2/KAMX/20180301183548/531/7/I/V06/0
20180301T183715.565148Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   52302 20180301183715.098104 NEXRAD2 531008  
L2-BZIP2/KAMX/20180301183548/531/8/I/V06/0
20180301T183729.780038Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   35115 20180301183729.127604 NEXRAD2 531009  
L2-BZIP2/KAMX/20180301183548/531/9/I/V06/0
20180301T183742.951852Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   44697 20180301183742.169540 NEXRAD2 531010  
L2-BZIP2/KAMX/20180301183548/531/10/I/V06/0
20180301T183755.796440Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   77966 20180301183755.208748 NEXRAD2 531011  
L2-BZIP2/KAMX/20180301183548/531/11/I/V06/0
20180301T183808.804764Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   83954 20180301183808.258579 NEXRAD2 531012  
L2-BZIP2/KAMX/20180301183548/531/12/I/V06/0
20180301T183821.815409Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   85828 20180301183821.298629 NEXRAD2 531013  
L2-BZIP2/KAMX/20180301183548/531/13/I/V06/0
20180301T183835.059959Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   74412 20180301183834.348468 NEXRAD2 531014  
L2-BZIP2/KAMX/20180301183548/531/14/I/V06/0
20180301T183846.963704Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   57051 20180301183846.387866 NEXRAD2 531015  
L2-BZIP2/KAMX/20180301183548/531/15/I/V06/0
20180301T183859.729475Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   70790 20180301183858.437669 NEXRAD2 531016  
L2-BZIP2/KAMX/20180301183548/531/16/I/V06/0
20180301T183912.692457Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   91190 20180301183910.482010 NEXRAD2 531017  
L2-BZIP2/KAMX/20180301183548/531/17/I/V06/0
20180301T183923.726880Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
  113914 20180301183922.542250 NEXRAD2 531018  
L2-BZIP2/KAMX/20180301183548/531/18/I/V06/0
20180301T183935.257395Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
  101012 20180301183934.589158 NEXRAD2 531019  
L2-BZIP2/KAMX/20180301183548/531/19/I/V06/0
20180301T183949.226640Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   33953 20180301183948.629090 NEXRAD2 531020  
L2-BZIP2/KAMX/20180301183548/531/20/I/V06/0
20180301T184002.125703Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   27121 20180301184001.657469 NEXRAD2 531021  
L2-BZIP2/KAMX/20180301183548/531/21/I/V06/0
20180301T184015.073122Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   35025 20180301184014.697406 NEXRAD2 531022  
L2-BZIP2/KAMX/20180301183548/531/22/I/V06/0
20180301T184028.288413Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   45275 20180301184027.729586 NEXRAD2 531023  
L2-BZIP2/KAMX/20180301183548/531/23/I/V06/0
20180301T184046.075206Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   53121 20180301184040.774483 NEXRAD2 531024  
L2-BZIP2/KAMX/20180301183548/531/24/I/V06/0
20180301T184054.267895Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   47580 20180301184053.808199 NEXRAD2 531025  
L2-BZIP2/KAMX/20180301183548/531/25/I/V06/0
20180301T184125.486160Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   46289 20180301184124.876090 NEXRAD2 531026  
L2-BZIP2/KAMX/20180301183548/531/26/I/V06/0
20180301T184155.473372Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   68366 20180301184154.949943 NEXRAD2 531027  
L2-BZIP2/KAMX/20180301183548/531/27/I/V06/0
20180301T184315.473651Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   85246 20180301184224.025199 NEXRAD2 531028  
L2-BZIP2/KAMX/20180301183548/531/28/I/V06/0
20180301T184322.905867Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   37956 20180301184255.087442 NEXRAD2 531029  
L2-BZIP2/KAMX/20180301183548/531/29/I/V06/0
20180301T184329.750938Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   53496 20180301184324.152145 NEXRAD2 531030  
L2-BZIP2/KAMX/20180301183548/531/30/I/V06/0
20180301T184355.318028Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   67203 20180301184354.220782 NEXRAD2 531031  
L2-BZIP2/KAMX/20180301183548/531/31/I/V06/0
20180301T184425.866479Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   33790 20180301184425.197518 NEXRAD2 531032  
L2-BZIP2/KAMX/20180301183548/531/32/I/V06/0
20180301T184455.612248Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   48045 20180301184454.263456 NEXRAD2 531033  
L2-BZIP2/KAMX/20180301183548/531/33/I/V06/0
20180301T184523.874797Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   55627 20180301184523.329427 NEXRAD2 531034  
L2-BZIP2/KAMX/20180301183548/531/34/E/V06/0
20180301T184536.012807Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
    8368 20180301184533.663729 NEXRAD2 532001  
L2-BZIP2/KAMX/20180301184532/532/1/S/V06/0
20180301T184546.337848Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   83737 20180301184544.711855 NEXRAD2 532002  
L2-BZIP2/KAMX/20180301184532/532/2/I/V06/0
20180301T184558.391409Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   94806 20180301184556.764567 NEXRAD2 532003  
L2-BZIP2/KAMX/20180301184532/532/3/I/V06/0
20180301T184609.749610Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
  166005 20180301184608.833310 NEXRAD2 532004  
L2-BZIP2/KAMX/20180301184532/532/4/I/V06/0
20180301T184621.821432Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
  171884 20180301184620.905194 NEXRAD2 532005  
L2-BZIP2/KAMX/20180301184532/532/5/I/V06/0
20180301T184633.743335Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
  175647 20180301184632.970523 NEXRAD2 532006  
L2-BZIP2/KAMX/20180301184532/532/6/I/V06/0
20180301T184645.680968Z notifyme[129548] INFO notifyme.c:222:notifymeprog_5()   
   96409 20180301184645.025085 NEXRAD2 532007  
L2-BZIP2/KAMX/20180301184532/532/7/I/V06/0
 ...

I am looking at the time difference in fields 1 and 5.  Here is the last one:

20180301T184645.680968Z - 20180301184645.02508 = 0.655888 seconds

The first field is the date/time that the product was received on weather, and
the second one was the time that the same product was put into the LDM queue
at the KAMX radar.  The time difference in this case is on the order of 0.66 
seconds.

A quick check of the other time differences shows that they range up
to a minute.  I do not consider this to be excessive by any means.

Given what I saw when I logged in, and what I am seeing in the LDM's receipt
of KAMX produts, I am left with the hypothesis that the reason for the
large time difference between when the volume scan pieces are received and
when they have been reassembled into full volume scans is disk I/O.

To check the above hypothesis, I ran 'iostat', and here is what I saw:

Linux 2.6.32-696.1.1.el6.x86_64 (weather.rsmas.miami.edu)       03/01/2018      
_x86_64_        (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.33    0.00    2.13   18.06    0.00   70.48

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda             490.81      9891.64     38558.55 235314067483 917276511900
sdc               6.83       608.71        55.26 14480782132 1314505410
sdb               8.65       662.86        52.51 15768922108 1249287050
md5               2.51        12.62        17.09  300194058  406558176
md6               0.00         0.00         0.00      45898         24
md1               1.19         4.43         5.11  105476080  121580648

The iowait shown, 18.06%, is high.  In fact, it is much higher than the
iowait on one of our motherlode class data servers that is processing
_all_ of the data available in the IDD including GOES-16 imagery:

~: iostat
Linux 2.6.32-696.13.2.el6.x86_64 (lead.unidata.ucar.edu)        03/01/2018      
_x86_64_        (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14.57    0.00    4.96    3.98    0.00   76.50

Note that iowait on our machine is 3.98.

Given the iowait being seen on weather, my hypothesis about the reason for
the delay in making NEXRAD Level 2 volume scans available on 
weather.rsmas.miami.edu
seems even more plausible.

The question now is how the iowait can be reduced on weather.rsmas.miami.edu.
I think that the first place to look is the 'rsync's that bmapes is running.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: TMZ-743033
Department: Support IDD
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.