[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #RJV-550417]: question on LDM log message



Heath,

> We have intermittently been getting this error in /var/adm/messages.

First off, if the messages are in file "/var/adm/messages", then it means that 
LDM logging isn't set-up correctly. The LDM system should log to its own log 
file and not to the system log file.

> Can you please tell me what it means, and if we need to do anything?
> Thanks.
> 
> Jan 23 23:16:28 Titan.HPC.MsState.Edu idd.cise-nsf.gov[2858] ERROR:
> readtcp(): s
> elect() timeout on socket 5
> Jan 23 23:16:28 Titan.HPC.MsState.Edu idd.cise-nsf.gov[2858] ERROR:
> one_svc_run(
> ): RPC layer closed connection
> Jan 23 23:16:28 Titan.HPC.MsState.Edu idd.cise-nsf.gov[2858] ERROR:
> Disconnectin
> g due to LDM failure; Connection to upstream LDM closed
> Jan 23 23:26:18 Titan.HPC.MsState.Edu idd.cise-nsf.gov[2857] ERROR:
> Disconnectin
> g due to LDM failure; Upstream LDM died

The messages mean that a downstream LDM process on Titan couldn't read anything 
from its TCP connection to an upstream LDM on idd.cise-nsf before the timeout 
timer went off. This was likely due to network congestion. Don't worry, the 
downstream LDM will re-establish the connection and continue from where it left 
off.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: RJV-550417
Department: Support LDM
Priority: Normal
Status: Closed