[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #LXP-916564]: ldmping



Michael,

> We’re still having problems with connections timing out on LDM.  I can bring 
> you up to date on the steps we've taken.
> 
> The problem is we're seeing some data coming into LDM but we're seeing many 
> disconnects and connections denied in the log.
> 
> Have tried both the latest linux Red Hat kernel and a previous kernel under 
> which LDM ran for a couple months without problems.  The problem persists on 
> both kernels.
> 
> Problem appears only on two servers, LDM-11 and LDM-12.  We have other LDM 
> installations that work fine.
> 
> On the servers with the problem, LDMPING LOCALHOST rarely resolves the IP and 
> connects.  Most often we get this:
> 
> [ldm@ldm-12 ldm]$ ldmping localhost
> Apr 17 14:23:13 INFO:      State    Elapsed Port   Remote_Host           
> rpc_stat
> Apr 17 14:23:13 INFO: Resolving localhost to 127.0.0.1 took 0.000302 seconds
> Apr 17 14:23:23 ERROR:   H_CLNTED  10.000029  388   localhost    select: RPC: 
> Timed out
> 
> If we change the port from 388 to 389 or 532 or 3885, LDMPING LOCALHOST works 
> fine every time, never a failure.  If we change the port back to 388 the 
> problem returns.
> 
> We’ve kick-started the server, rebuilt LDM according to the directions on the 
> "LDM INSTALL" web page.  We used "make install_setuids" to set the proper 
> owner and permissions on all LDM files and directories.
> 
> We created a very basic ldmd.conf file with nothing in it other than this:
> 
> #
> #
> #
> # CRH ldm.crh.noaa.gov
> #
> #
> #exec   "pqexpire"
> exec    "pqbinstats"

If you're not using the output from "pqbinstats" then you should remove this 
entry.  It shouldn't affect your problem, however.

> exec    "rtstats -h rtstats.unidata.ucar.edu"
> #
> ##############################################################################
> # Begin Access control
> ###############################################################################
> #
> ###############################################################################
> # ALLOW: Who we are willing to feed
> allow   ANY     ^((localhost|loopback)|(127\.0\.0\.1\.?$))
> #
> allow   ANY     .noaa.gov

You should change the above to "allow ANY \.noaa\.gov$".

> ###############################################################################
> # ACCEPT: Who can feed us, currently this action is only needed for WSI data
> #
> # accept <feedset> <pattern> <hostname pattern>
> ###############################################################################
> # accept anything from yourself
> #
> accept  ANY     ".*"    ^((localhost|loopback)|(127\.0\.0\.1\.?$))
> #
> #
> 
> We restarted LDM and tried LDMPING LOCALHOST with the same connection problem 
> resulting.
> 
> We can't find anything else addressing port 388.

I'm not sure what you mean by the last line above.  Please explain.

> Just received your latest note suggesting "netstat -n -a -t | grep 388".  
> Here's the results:
> 
> [root@ldm-12 ~]# netstat -n -a -t | grep 388
> tcp        0      0 204.227.126.195:388         140.90.64.100:57649         
> TIME_WAIT
> tcp        0      0 204.227.126.195:388         198.200.151.151:48070       
> TIME_WAIT
> tcp        0      0 204.227.126.195:388         204.228.186.180:35368       
> TIME_WAIT
> tcp        0      0 204.227.126.195:388         161.55.224.192:50416        
> TIME_WAIT

It looks like you have 4 other LDM systems that requested data from the local 
LDM and that just disconnected.

I don't see the LDM server listening on port 388 in the above output (the "-a" 
option should have cause it to be listed).  Where is it?

> That's all I can think of.  We seem to have eliminated network infrastructure 
> such as routers, switches, etc with the problem showing up on a ping of 
> localhost.  Other LDM installs on the same branch of the network work fine.  
> LDM configuration, permissions, ownership seems to be okay.  All we can see 
> is disconnects and connections denied in the log.

I don't know what's going on either.  May we log onto one of the computers in 
question as the LDM user?  At this point, I'm afraid that will be necessary in 
order to diagnose the problem in a timely manner.

> Confused and scratching my head,
> Michael

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: LXP-916564
Department: Support LDM
Priority: Normal
Status: Closed