[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #NQY-760567]: Connection Error Message in ldmd.log



Hiro,

> Full Name: Hiro Gosden
> Email Address: address@hidden
> Organization: CIRA
> Package Version: 6.6.5
> Operating System: RHEL 4
> Hardware: Workstation
> Description of problem: I'm getting a lot of status 6 & 7 error messages that 
> indicate "Broken Pipe" and "Couldn't flush connection," then "connection rest 
> by peer." Some times, it seems to tie-up the network port and grinds the 
> system to a halt.  The system crash doesn't happen often any more, but a 
> couple of weeks ago, it crashed almost every day.  Would you know what may be 
> causing this?  Thanks,

The log file contained messages like this:

Sep 11 00:00:20 awips rhesrv18.spc.noaa.gov(feed)[11208] NOTE: feed or notify 
failure; Error sending BLKDATA: RPC: Unable to send; errno = Broken pipe 
Sep 11 00:00:20 awips rpc.ldmd[17742] NOTE: child 11208 exited with status 7 
Sep 11 00:00:21 awips rhesrv18.spc.noaa.gov(feed)[11803] NOTE: Starting 
Up(6.6.5/6): 20120910230019.884 TS_ENDT {{EXP,  ".*"}}, 
SIG=2d74e065bf62deb5a2aa439f04d393d6, Primary 
Sep 11 00:00:21 awips rhesrv18.spc.noaa.gov(feed)[11803] NOTE: topo:  
rhesrv18.spc.noaa.gov {{EXP, (.*)}} 
Sep 11 00:05:50 awips rhesrv18.spc.noaa.gov(feed)[13004] NOTE: Starting 
Up(6.6.5/6): 20120910230549.042 TS_ENDT {{EXP,  ".*"}}, 
SIG=e7b7e7b35d9b954bcb2b09f0e5cd9ee3, Alternate 
Sep 11 00:05:50 awips rhesrv18.spc.noaa.gov(feed)[13004] NOTE: topo:  
rhesrv18.spc.noaa.gov {{EXP, (.*)}} 
Sep 11 00:06:19 awips rhesrv18.spc.noaa.gov(feed)[11803] ERROR: Couldn't flush 
connection; nullproc_6() failure to rhesrv18.spc.noaa.gov: RPC: Unable to 
receive; errno = Connection reset by peer 
Sep 11 00:06:19 awips rpc.ldmd[17742] NOTE: child 11803 exited with status 6 
Sep 11 00:15:10 awips rhesrv18.spc.noaa.gov(feed)[8289] NOTE: feed or notify 
failure; HEREIS: RPC: Unable to send; errno = Broken pipe 
Sep 11 00:15:10 awips rpc.ldmd[17742] NOTE: child 8289 exited with status 7 
Sep 11 00:15:43 awips rhesrv18.spc.noaa.gov(feed)[14907] NOTE: Starting 
Up(6.6.5/6): 20120910231542.575 TS_ENDT {{EXP,  ".*"}}, 
SIG=0b19cf4b14f15e3a8814ec1ba8608b22, Primary 
Sep 11 00:15:43 awips rhesrv18.spc.noaa.gov(feed)[14907] NOTE: topo:  
rhesrv18.spc.noaa.gov {{EXP, (.*)}} 
Sep 11 00:16:12 awips rhesrv18.spc.noaa.gov(feed)[13004] ERROR: Couldn't flush 
connection; nullproc_6() failure to rhesrv18.spc.noaa.gov: RPC: Unable to 
receive; errno = Connection reset by peer 
Sep 11 00:16:12 awips rpc.ldmd[17742] NOTE: child 13004 exited with status 6 

The messages are due to the downstream LDM processes switching between PRIMARY 
and ALTERNATE modes and may safely be ignored. Indeed, the ERROR level of the 
"Couldn't flush" message is demoted to NOTICE in the current LDM release.

This is the way the LDM is designed to work. Unfortunately, it results in many 
log messages.

While individual upstream LDM processes might terminate, the LDM system as a 
whole shouldn't crash, tie-up, or lock. Please send me any evidence of this 
happening.

> Hiro

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: NQY-760567
Department: Support LDM
Priority: Normal
Status: Closed