[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #CSZ-262645]: errant LDM REQUESTs on sirocco.srcc.lsu.edu



Hi David,

re:
> I noticed one of our email servers was down.  Sirocco's ldm uses that email
> server to send us logs.  I restarted the email server. Let me know if that
> fixed the persistent 'REQUEST's.

We don't think that this would/could have anything to do with the repeated
REQUESTs from sirocco.  One of the primary reasons for this is that we are
now seeing the same behavior from pavan.srcc.lsu.edu -- when I looked a
couple of minutes ago on the node of idd.unidata.ucar.edu that is servicing
REQUESTs from pavan, I saw 43 connections.  When I looked 12 minutes ago, there
were 30 REQUESTs.  From what I see, pavan should only have one REQUEST line
in its ~ldm/etc/ldmd.conf file, and that is for everything (".*") for
DIFAX|FSL2|UNIWISC.

The 'netstat' output for the connections on our end are telling; here is
a dump that I just did:

/opt/ldm% netstat -a | grep ESTAB | grep pavan
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:36027    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38455    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:36010    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38488    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38893    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38450    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38907    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38501    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50282    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38491    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38900    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38968    
ESTABLISHED 
tcp        0    260 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:40691    
ESTABLISHED 
tcp        0 1772352 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50250    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50296    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38453    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38460    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50219    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38486    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50208    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38515    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38494    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38495    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38483    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38465    
ESTABLISHED 
tcp        0 1772352 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50268    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50242    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38485    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50231    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38451    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38482    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38456    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:35995    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38932    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38952    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38470    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38508    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38476    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38447    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38914    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50276    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50226    
ESTABLISHED 
tcp        0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38454    
ESTABLISHED 

The third column shows the number of bytes in the transmit queue for the machine
listed in the 5th column (pavan).  This listing suggests that there is some
"packet shaping" being done on the LSU and/or SRCC side that is limiting/slowing
down the transfer of data from the idd.unidata.ucar.edu cluster node to
SRCC machines.  This limit apparently also exists in feed REQUESTs being
serviced by idd.ssec.wisc.edu:

<on idd.ssec.wisc.edu>

unidata2-new.ldm     sirocco.srcc.lsu.edu.53969 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53970 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53971 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53972 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53974 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53975 27776  17783 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53976 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53977 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53978 22272  15047 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53979 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53980 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53981 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53982 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53983 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53984 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53985 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53986 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53987 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53988 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53989 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     sirocco.srcc.lsu.edu.53990 19584  13679 492480      0 
ESTABLISHED
unidata2-new.ldm     pavan.srcc.lsu.edu.38566 443264      0 492480      0 
ESTABLISHED

So, the question is what, if anything, has been done to the networking
at LSU and/or SRCC in the past week?  I say the past week because there
are indications from the LDM log file (~ldm/logs/ldmd.conf.save) that this
behavior started back in earnest at 14:05:50 on January 22:

Jan 23 14:05:50 sirocco idd.ssec.wisc.edu[16342] ERROR: readtcp(): select() 
timeout on socket 4 
Jan 23 14:05:50 sirocco idd.ssec.wisc.edu[16342] ERROR: one_svc_run(): RPC 
layer closed connection 
Jan 23 14:05:50 sirocco idd.ssec.wisc.edu[16342] ERROR: Disconnecting due to 
LDM failure; Connection to upstream LDM closed 
Jan 23 14:05:50 sirocco idd.ssec.wisc.edu[16342] NOTE: LDM-6 desired 
product-class: 20130123130550.236 TS_ENDT {{IDS|DDPLUS,  ".*"},{NONE,  
"SIG=3d104d7efb0c4abf7b6561887892b21d"}} 
Jan 23 14:05:50 sirocco idd.ssec.wisc.edu[16342] NOTE: Upstream LDM-6 on 
idd.ssec.wisc.edu is willing to be a primary feeder 
Jan 23 14:06:26 sirocco idd.ssec.wisc.edu[16342] ERROR: readtcp(): select() 
timeout on socket 4 
Jan 23 14:06:26 sirocco idd.ssec.wisc.edu[16342] ERROR: one_svc_run(): RPC 
layer closed connection 
Jan 23 14:06:26 sirocco idd.ssec.wisc.edu[16342] ERROR: Disconnecting due to 
LDM failure; Connection to upstream LDM closed 
Jan 23 14:06:26 sirocco idd.ssec.wisc.edu[16342] NOTE: LDM-6 desired 
product-class: 20130123130626.548 TS_ENDT {{IDS|DDPLUS,  ".*"},{NONE,  
"SIG=decdf0a7a61b93a9a0d66997bd1420bf"}} 
Jan 23 14:06:26 sirocco idd.ssec.wisc.edu[16342] NOTE: Upstream LDM-6 on 
idd.ssec.wisc.edu is willing to be a primary feeder 
Jan 23 14:07:02 sirocco idd.ssec.wisc.edu[16342] ERROR: readtcp(): select() 
timeout on socket 4 
Jan 23 14:07:02 sirocco idd.ssec.wisc.edu[16342] ERROR: one_svc_run(): RPC 
layer closed connection 
Jan 23 14:07:02 sirocco idd.ssec.wisc.edu[16342] ERROR: Disconnecting due to 
LDM failure; Connection to upstream LDM closed 
Jan 23 14:07:02 sirocco idd.ssec.wisc.edu[16342] NOTE: LDM-6 desired 
product-class: 20130123130702.849 TS_ENDT {{IDS|DDPLUS,  ".*"},{NONE,  
"SIG=3ea2f1cbdaf12fd9fc60255f536d680e"}} 
Jan 23 14:07:02 sirocco idd.ssec.wisc.edu[16342] NOTE: Upstream LDM-6 on 
idd.ssec.wisc.edu is willing to be a primary feeder 
Jan 23 14:07:39 sirocco idd.ssec.wisc.edu[16342] ERROR: readtcp(): select() 
timeout on socket 4 
 ...

The kinds of log messages related to idd.unidata.ucar.edu started in earnest
at 22:46:36 on January 23:

Jan 23 22:46:36 sirocco idd.unidata.ucar.edu[16344] ERROR: readtcp(): select() 
timeout on socket 4 
Jan 23 22:46:36 sirocco idd.unidata.ucar.edu[16344] ERROR: one_svc_run(): RPC 
layer closed connection 
Jan 23 22:46:36 sirocco idd.unidata.ucar.edu[16344] ERROR: Disconnecting due to 
LDM failure; Connection to upstream LDM closed 
Jan 23 22:46:36 sirocco idd.unidata.ucar.edu[16344] NOTE: LDM-6 desired 
product-class: 20130123214636.808 TS_ENDT {{IDS|DDS,  ".*"},{NONE,  
"SIG=60bbe6ac30126de9b05fe9063d0817f7"}} 
Jan 23 22:46:46 sirocco idd.unidata.ucar.edu[16344] WARN: Couldn't resolve 
"idd.unidata.ucar.edu" to an Internet address in 10.0013 seconds 
Jan 23 22:46:46 sirocco idd.unidata.ucar.edu[16344] ERROR: Disconnecting due to 
LDM failure; Couldn't get IP address of host idd.unidata.ucar.edu; try again 
later 
Jan 23 22:46:47 sirocco idd.unidata.ucar.edu[16344] NOTE: LDM-6 desired 
product-class: 20130123214647.811 TS_ENDT {{IDS|DDS,  ".*"},{NONE,  
"SIG=60bbe6ac30126de9b05fe9063d0817f7"}} 
Jan 23 22:47:11 sirocco idd.unidata.ucar.edu[16344] NOTE: Upstream LDM-6 on 
idd.unidata.ucar.edu is willing to be a primary feeder 
Jan 23 22:47:13 sirocco idd.ssec.wisc.edu[16342] ERROR: readtcp(): select() 
timeout on socket 4 
Jan 23 22:47:13 sirocco idd.ssec.wisc.edu[16342] ERROR: one_svc_run(): RPC 
layer closed connection 
Jan 23 22:47:13 sirocco idd.ssec.wisc.edu[16342] ERROR: Disconnecting due to 
LDM failure; Connection to upstream LDM closed 
Jan 23 22:47:21 sirocco idd.unidata.ucar.edu[16344] ERROR: readtcp(): EOF on 
socket 4 
Jan 23 22:47:21 sirocco idd.unidata.ucar.edu[16344] ERROR: one_svc_run(): RPC 
layer closed connection 
Jan 23 22:47:21 sirocco idd.unidata.ucar.edu[16344] ERROR: Disconnecting due to 
LDM failure; Connection to upstream LDM closed 
Jan 23 22:47:23 sirocco idd.unidata.ucar.edu[16344] NOTE: LDM-6 desired 
product-class: 20130123214723.495 TS_ENDT {{IDS|DDS,  ".*"},{NONE,  
"SIG=60bbe6ac30126de9b05fe9063d0817f7"}} 
Jan 23 22:47:26 sirocco idd.ssec.wisc.edu[16342] NOTE: LDM-6 desired 
product-class: 20130123214726.837 TS_ENDT {{IDS|DDPLUS,  ".*"},{NONE,  
"SIG=72da2c1a94bd67d40b89e566fcf00d32"}} 
Jan 23 22:47:27 sirocco idd.unidata.ucar.edu[16344] NOTE: Upstream LDM-6 on 
idd.unidata.ucar.edu is willing to be a primary feeder 
Jan 23 22:47:30 sirocco idd.unidata.ucar.edu[16344] ERROR: Broken pipe 
Jan 23 22:47:30 sirocco idd.unidata.ucar.edu[16344] ERROR: writetcp(): write() 
error on socket 4 
Jan 23 22:47:30 sirocco idd.unidata.ucar.edu[16344] ERROR: one_svc_run(): RPC 
layer closed connection 
Jan 23 22:47:30 sirocco idd.unidata.ucar.edu[16344] ERROR: Disconnecting due to 
LDM failure; Connection to upstream LDM closed 
Jan 23 22:47:30 sirocco idd.unidata.ucar.edu[16344] NOTE: LDM-6 desired 
product-class: 20130123214730.496 TS_ENDT {{IDS|DDS,  ".*"},{NONE,  
"SIG=13138cb3ac4dfc050edc8b5135a17d2d"}} 
Jan 23 22:47:31 sirocco idd.ssec.wisc.edu[16342] NOTE: Upstream LDM-6 on 
idd.ssec.wisc.edu is willing to be a primary feeder 
Jan 23 22:47:51 sirocco idd.ssec.wisc.edu[16342] ERROR: Broken pipe 
Jan 23 22:47:51 sirocco idd.ssec.wisc.edu[16342] ERROR: writetcp(): write() 
error on socket 4 
Jan 23 22:47:51 sirocco idd.ssec.wisc.edu[16342] ERROR: one_svc_run(): RPC 
layer closed connection 
Jan 23 22:47:51 sirocco idd.ssec.wisc.edu[16342] ERROR: Disconnecting due to 
LDM failure; Connection to upstream LDM closed 
Jan 23 22:47:53 sirocco idd.unidata.ucar.edu[16341] ERROR: readtcp(): select() 
timeout on socket 4 
Jan 23 22:47:53 sirocco idd.unidata.ucar.edu[16341] ERROR: one_svc_run(): RPC 
layer closed connection 
Jan 23 22:47:53 sirocco idd.unidata.ucar.edu[16341] ERROR: Disconnecting due to 
LDM failure; Connection to upstream LDM closed 
Jan 23 22:47:53 sirocco idd.unidata.ucar.edu[16341] NOTE: LDM-6 desired 
product-class: 20130123214753.275 TS_ENDT {{WMO,  ".*"},{NONE,  
"SIG=840ee3c0a8c5a9e69d571521dc9a3f39"}} 
Jan 23 22:48:06 sirocco idd.unidata.ucar.edu[16344] NOTE: Upstream LDM-6 on 
idd.unidata.ucar.edu is willing to be a primary feeder 
Jan 23 22:48:10 sirocco idd.unidata.ucar.edu[16341] NOTE: Upstream LDM-6 on 
idd.unidata.ucar.edu is willing to be a primary feeder 
 ...


Have you heard back from LSU and/or SRCC network admins about anything
that may have been done in this timeframe?



Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: CSZ-262645
Department: Support LDM
Priority: Normal
Status: Open