[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ldmping does not work but data flows fine



Chirag,

>Date: Wed, 23 Nov 2005 15:03:48 -0600
>From: "Shukla, Chirag" <address@hidden>
>Organization: San Diego State University
>To: "Steve Emmerson" <address@hidden>
>Subject: Ldmping does not work but data flows fine

The above message contained the following:

> We have a machine called 'unidata.jacks.local' that feeds
> 'ae206-06.jacks.local' and 'ae206-03.jacks.local' machine. For a few
> minutes ae206-06 machine did not receive updated data from 'unidata'
> machine. I tried to `ldmping ae206-06` and saw that LDM on ae206-06 ws
> not responding!
> 
> `ldmping ae206-06.jacks.local` from 'unidata' resulted in the following:
> unidata /data> ldmping ae206-06.jacks.local
> Nov 23 20:29:02 INFO:      State    Elapsed Port   Remote_Host
> rpc_stat
> Nov 23 20:29:12 ERROR: SVC_UNAVAIL  10.002665    0
> ae206-06.jacks.local  h_clnt_create(ae206-06.jacks.local): Timed out
> while creating connection
> Nov 23 20:29:37 ERROR:  ADDRESSED   0.000002    0   ae206-06.jacks.local
> h_clnt_create(ae206-06.jacks.local): Timed out while creating connection
> Nov 23 20:30:12 ERROR:      NAMED   9.998855    0   ae206-06.jacks.local
> can't contact portmapper: RPC: Timed out

The above indicates that a downstream LDM on host unidata couldn't
connect to an upstream LDM on host ae206-06.  The reason is unclear.

Executing this command

    rpcinfo -n 388 -t ae206-06.jacks.local 300029 6

(or something similar) on host unidata should reveal the problem.

> unidata raws/data> ldmping ae206-03.jacks.local
> Nov 23 20:42:32 INFO:      State    Elapsed Port   Remote_Host
> rpc_stat
> Nov 23 20:42:32 INFO: RESPONDING   0.015185  388   ae206-03.jacks.local
> 
> 
> >From ae206-03 >>
> address@hidden /]$ ldmping unidata.jacks.local
> Nov 23 20:32:57 INFO:      State    Elapsed Port   Remote_Host
> rpc_stat
> Nov 23 20:32:57 INFO: RESPONDING   0.010462  388   unidata.jacks.local
> 
> address@hidden raws]$ ldmping ae206-06.jacks.local
> Nov 23 20:43:10 INFO:      State    Elapsed Port   Remote_Host
> rpc_stat
> 'Nov 23 20:43:10 INFO: RESPONDING   0.002772  388   ae206-06.jacks.local
> 
> 
> >From ae206-06 >>
> address@hidden raws]$ ldmping unidata.jacks.local
> Nov 23 20:44:15 INFO:      State    Elapsed Port   Remote_Host
> rpc_stat
> Nov 23 20:44:15 INFO: RESPONDING   0.012841  388   unidata.jacks.local
> 
> address@hidden raws]$ ldmping ae206-03.jacks.local
> Nov 23 20:44:07 INFO:      State    Elapsed Port   Remote_Host
> rpc_stat
> Nov 23 20:44:07 INFO: RESPONDING   0.004253  388   ae206-03.jacks.local
> 
> 
> 
> 
> I can `ping` and `host` or `nslookup` one another just fine:
> unidata /home/ldm> ping ae206-06
> PING ae206-06.jacks.local (137.216.177.37) 56(84) bytes of data.
> 64 bytes from ae206-06.jacks.local (137.216.177.37): icmp_seq=1 ttl=63
> time=3.96 ms
> 64 bytes from ae206-06.jacks.local (137.216.177.37): icmp_seq=2 ttl=63
> time=4.33 ms
> 
> address@hidden raws]$ ping unidata.jacks.local
> PING unidata.jacks.local (137.216.132.176) 56(84) bytes of data.
> 64 bytes from unidata.jacks.local (137.216.132.176): icmp_seq=0 ttl=63
> time=4.14 ms
> 64 bytes from unidata.jacks.local (137.216.132.176): icmp_seq=1 ttl=63
> time=4.85 ms
> 
> These are the logs:
> unidata /data> cat ~/logs/ldmd.log | grep ae206-06
> Nov 23 20:15:43 unidata ae206-06[3026] NOTE: Data-product with signature
> df17b19bdbab14359eb205a7c5ec4f8e wasn't found in product-queue
> Nov 23 20:15:43 unidata ae206-06(feed)[3026] NOTE: Starting Up(6.4.2/6):
> 20051123201034.078 TS_ENDT {{ANY,  ".*"}}, Primary
> Nov 23 20:15:43 unidata ae206-06(feed)[3026] NOTE: topo:
> ae206-06.jacks.local {{ANY, (.*)}}
> Nov 23 20:15:44 unidata ae206-06[3027] NOTE: Data-product with signature
> 1e0c309abba55a19832b53bdce52901e wasn't found in product-queue
> Nov 23 20:15:44 unidata ae206-06(feed)[3027] NOTE: Starting Up(6.4.2/6):
> 20051123193310.368 TS_ENDT {{CONDUIT,  "MT.(eta|nam)"}}, Primary
> Nov 23 20:15:44 unidata ae206-06(feed)[3027] NOTE: topo:
> ae206-06.jacks.local {{CONDUIT, (.*)}}

Because a downstream LDM on host ae206-06 requested data-products of
type ANY/".*", it's unnecessary for another downstream LDM on that host
to also request data-products of type CONDUIT/".*".  Doing so, will
merely increase your bandwith utilization without any benefit.

> But seems to be something going on here on ae206-06
> address@hidden raws]$ cat ~/logs/ldmd.log | grep unidata.jacks.local
> Nov 23 20:05:50 ae206-06 unidata[12500] NOTE: Starting Up(6.4.2):
> unidata.jacks.local:388 20051123190550.479 TS_ENDT {{ANY,  ".*"}}
> Nov 23 20:05:50 ae206-06 unidata[12501] NOTE: Starting Up(6.4.2):
> unidata.jacks.local:388 20051123190550.482 TS_ENDT {{CONDUIT,
> "MT.(eta|nam)"}}
> Nov 23 20:05:50 ae206-06 unidata[12500] NOTE: Upstream LDM-6 on
> unidata.jacks.local is willing to be a primary feeder
> Nov 23 20:05:50 ae206-06 unidata[12501] NOTE: Upstream LDM-6 on
> unidata.jacks.local is willing to be a primary feeder
> Nov 23 20:15:36 ae206-06 unidata[12500] ERROR: Terminating due to LDM
> failure; nullproc_6 failure to unidata.jacks.local; RPC: Unable to
> receive; errno = Connection reset by peer
> Nov 23 20:15:37 ae206-06 unidata[12500] ERROR: Terminating due to LDM
> failure; Couldn't connect to LDM on unidata.jacks.local using either
> port 388 or portmapper; : RPC: Program not registered
> Nov 23 20:15:37 ae206-06 unidata[12501] ERROR: Terminating due to LDM
> failure; Couldn't connect to LDM on unidata.jacks.local using either
> port 388 or portmapper; : RPC: Program not registered
> Nov 23 20:15:39 ae206-06 unidata[12500] ERROR: Terminating due to LDM
> failure; Couldn't connect to LDM on unidata.jacks.local using either
> port 388 or portmapper; : RPC: Program not registered
> Nov 23 20:15:39 ae206-06 unidata[12501] ERROR: Terminating due to LDM
> failure; Couldn't connect to LDM on unidata.jacks.local using either
> port 388 or portmapper; : RPC: Program not registered
> Nov 23 20:15:42 ae206-06 unidata[12500] NOTE: Upstream LDM-6 on
> unidata.jacks.local is willing to be a primary feeder
> Nov 23 20:15:43 ae206-06 unidata[12501] NOTE: Upstream LDM-6 on
> unidata.jacks.local is willing to be a primary feeder
> 
> Why am I not able to `ldmping ae206-06.jacks.local`? There has been no
> change made to any firewalls, hardware or software...except that FC4 was
> updated. Despite ldmping not working, ae206-06 now gets data just fine.

Execute that rpcinfo(1) command on host ae206-06.  What does it output?

> Unidata uses: 
> unidata /home/ldm> uname -a
> Linux unidata 2.4.21-99-smp4G #1 SMP Wed Sep 24 14:13:20 UTC 2003 i686
> i686 i386 GNU/Linux
> 
> Ae206-06 uses:
> address@hidden raws]$ uname -a
> Linux ae206-06.jacks.local 2.6.11-1.1369_FC4 #1 Thu Jun 2 22:55:56 EDT
> 2005 i686 i686 i386 GNU/Linux
> 
> 
> >From unidata>>
> Traceroute'ing to ae206-06 or ae206-03 does not result in anything.
> Probably this could be a firewall issue at our end.
> 
> Is there a red flag somewhere?
> 
> Thanks.
> 
> Sincerely,
> Chirag Shukla
> South Dakota State University

Regards,
Steve Emmerson


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.