[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #VTQ-127094]: odd ldm question



Michael,

> Hi Steve,
> Sorry for the delayed reply.

No worries.

> I'm able to reproduce the problem on the same
> upstream host and a downstream host that I have here at CRH as well, so I
> can get you information from both upstream and downstream.

Excellent!

> The problem is
> the same. I have an allow on an upstream host and a request on a downstream
> host, but the data doesn't arrive at the downstream host. On the downstream
> host I can run notifyme looking for the data at the upstream host and I can
> see the data upstream. So from your previous message, here's the upstream
> allow.
> 
> allow   NGRID   ^204\.227\.126\.52$

Looks good.

> On the downstream ldmd.conf, here's the corresponding request:
> 
> request NGRID   ".*"    204.227.126.125

Also good.

> On the downstream ldm, I can run notifyme looking upstream and see the
> NGRID data arriving:
> 
> [ldm@monitor-2 etc]$ notifyme -vxl - -h 204.227.126.125 -f NGRID May 01 
> 15:13:16 notifyme[18988] NOTE: Starting Up: 204.227.126.125:
> 20180501151316.807 TS_ENDT {{NGRID,  ".*"}} May 01 15:13:16 notifyme[18988] 
> NOTE: LDM-5 desired product-class:
> 20180501151316.807 TS_ENDT {{NGRID,  ".*"}} May 01 15:13:16 DEBUG: 
> NOTIFYME(204.227.126.125) returns OK
> May 01 15:13:16 notifyme[18988] NOTE: NOTIFYME(204.227.126.125): OK
> May 01 15:13:17 notifyme[18988] INFO: 309fefb0596d2242b40024a80dc42087 845424 
> 20180501151317.053   NGRID 13295368  YSCI98 KWBY 011400 
> !grib2/ncep/HRRR/#255/201805011400F008/SNDM/0 - NONE
> May 01 15:13:18 notifyme[18988] INFO: 0901a01ebc1ef94b3df893f2e84e4c51 513984 
> 20180501151317.675   NGRID 13295371  YSCI98 KWBY 011400 
> !grib2/ncep/HRRR/#255/201805011400F008/SWEM01/0 - NONE
> ^CMay 01 15:13:21 notifyme[18988] NOTE: exiting
> [ldm@monitor-2 etc]$

Good.

> But if I run notifyme -vxl - -h localhost -f NGRID or ldmadmin watch -f
> NGRID on the downstream LDM, No NGRID data ever appears.

Not good.

> I did find this in the upstream ldmd.log file regarding the downstream
> connection:
> 
> 20180501T151316.316859Z 204.227.126.52(noti)[12187] NOTE 
> forn5_svc.c:468:forn_5_svc() Starting Up(6.13.4/5): 20180501151316.807391 
> TS_ENDT {{NGRID, ".*"}}
> 20180501T151316.316905Z 204.227.126.52(noti)[12187] NOTE 
> forn5_svc.c:471:forn_5_svc() topo:  204.227.126.52 NGRID
> 20180501T151326.542951Z 204.227.126.52(noti)[12187] ERROR 
> forn5_svc.c:273:noti5_sqf() YECJ98 KWBY 011400
> !grib2/ncep/HRRR/#255/201805011400F009/APCP01/0 - NONE: RPC: Unable to receive
> 20180501T151326.543007Z 204.227.126.52(noti)[12187] ERROR 
> forn5_svc.c:554:forn_5_svc() pq_sequence failed: Input/output error (errno = 
> 5)
> 20180501T151326.543029Z 204.227.126.52(noti)[12187] NOTE ldmd.c:187:cleanup() 
> Exiting
> 20180501T151326.545180Z ldmd[2545] NOTE ldmd.c:170:reap() child 12187 exited 
> with status 1

These messages are due to the notifyme(1) process on the downstream LDM. 
(Incidentally, the clocks on the two systems appear to be about a half-second 
off from each other.)

Do you have any messages in the upstream LDM log file due to the downstream LDM 
process that was started as a result of the REQUEST entry in the downstream 
LDM's configuration-file?

Similarly, what are the log messages from the connection attempt from the 
REQUEST-based downstream LDM process?

> Going back to the original message where I was asking about different
> downstream hosts which are at a WFO outside of my office, I see errors like
> this:
> 
> 20180501T151606.428910Z 204.227.119.201[2551] WARN error.c:236:err_log() 
> Couldn't connect to LDM on 204.227.119.201 using either port 388 or 
> portmapper; : RPC: Remote system error - Connection timed out
> 20180501T151607.024931Z 204.227.119.229[2550] WARN error.c:236:err_log() 
> Couldn't connect to LDM on 204.227.119.229 using either port 388 or 
> portmapper; : RPC: Remote system error - Connection timed out
> 20180501T151607.153904Z 204.227.103.133[2548] WARN error.c:236:err_log() 
> Couldn't connect to LDM on 204.227.103.133 using either port 388 or 
> portmapper; : RPC: Remote system error - Connection timed out
> 
> These are the downstream hosts trying to get data from the same upstream
> host in the information above.

Not really. The IP addresses in the information above are 104.227.126.125 
(upstream) and 204.227.126.52 (downstream). The above log messages show failed 
connection attempts to 204.227.119.201, 204.227.119.229, and 204.227.103.133. 
These are different IP addresses -- unless I'm misunderstanding something.

> Seem to see different log entries, but these
> hosts all show the same symptoms....can't get data from upstream.

The log messages from the downstream LDM processes at the WFO-s show that they 
couldn't even reach the relevant upstream LDM-s. There can be multiple reasons 
for this (firewalls, wrong IP address, etc.). If the IP addresses are correct, 
then the RPC timeouts make me suspect a firewall issue. I would first try 
notifyme(1)s on those downstream systems to their relevant upstream LDM-s. If 
those don't work, then I'd try ping(1)ing the upstream hosts. If the ping(1)s 
work, then I'd definitely suspect a firewall issue -- which can be verified by 
running telnet(1) on the downstream hosts to port 388 on the upstream hosts 
(e.g., "telnet 204.227.119.201 388).

> If I can provide anything else, please let me know.

Relevant upstream and downstream log entries from a REQUEST-based connection 
attempt on your test systems would be good.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: VTQ-127094
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.