[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030924: Two LDM's not talking to each other



Alan,

>Date: Wed, 24 Sep 2003 10:42:48 -0400
>From: "Alan Hall" <address@hidden>
>Organization: NOAA
>To: Jody Klein <address@hidden>,
>To: address@hidden,
>To: address@hidden
>Subject: Re: [Fwd: 20030919: Two LDM's not talking to each other]

The above message contained the following:

> Steve, This is a reply to my Systems folks about the on going
> problems.  Please correct any mis-information I have given here.

> Sep 19 12:53:13 doppler 192.67.134.137[38840]: Starting Up(6.0.14): 
> 192.67.134.137: TS_ZERO TS_ENDT {{FSL5,".*"}}
>
> This line from one of the logfiles indicates that doppler is trying to
> start communication with humboldt.  Steve's assumption is that becuase
> it doesn't have the FQDN instead of the IP, that doppler can't resolve
> the address.  I don't think that's true because I use the IP addresses
> instead of FQDN in LDM's configuration files.  That way, tranlation is
> not necessary.

I believe you're right.  I wish I'd known that you used IP address
rather than hostnames in the LDM configuration-file.

> Let's take this example where doppler is asking humboldt
> (192.67.134.137) for data.  Doppler's logfile shows:
> 
> Sep 19 12:53:13 doppler 192.67.134.137[38840]: Delay: 580812.2104 sec
> Sep 19 12:53:13 doppler 192.67.134.137[38840]: pq_sequence(): 
> time(insert)-time(create): 4621.4423 s
> Sep 19 12:53:13 doppler 192.67.134.137[38840]: cursor reset: stop searching
>
> I have no idea what this means, I agree that the time is consistant on
> doppler and humboldt.

The above means that Doppler didn't find anything relevant in searching
through its product-queue for the most recent FSL5 data-product.

> Sep 19 12:53:13 doppler 192.67.134.137[38840]: Desired product class: 
> 20030919115313.485 TS_ENDT {{FSL5,  ".*"}}
> Sep 19 12:53:13 doppler 192.67.134.137[38840]: Connected to upstream LDM-6
> Sep 19 12:53:13 doppler 192.67.134.137[38840]: requester6.c:274: Calling 
> feedme_6(...)
> Sep 19 12:53:13 doppler 192.67.134.137[38840]: Upstream LDM is willing to feed
> Sep 19 12:53:13 doppler 192.67.134.137[38840]: requester6.c:524: Calling 
> run_service()
> Sep 19 12:53:13 doppler 192.67.134.137[38840]: requester6.c:187: Downstream 
> LDM initialized
>
> This indicates that the setup between the two is good.  Humboldt has
> acknowledged that doppler wants data.  Now humboldt should set up a
> data connection, and that is where the error below happens.

Just to clarify, a new connection isn't established between Humbolt and
Doppler; instead, the existing connection that Doppler established with
Humbolt is "turned around" so that Humbolt is now the client and Doppler
the server.  Packets from Humbolt to Doppler will have port 388 in the
return address and the arbitrary port number that Doppler established when
connecting to Humbolt in the destination address.

> Sep 19 12:53:24 doppler 192.67.134.137[38840]: ERROR: requester6.c:206: 
> Connection to upstream LDM closed
> Sep 19 12:53:24 doppler 192.67.134.137[38840]: Sleeping 30 seconds before 
> retrying...

Doppler executed a select() system-call using the file-descriptor of the
established connection.  Apparently, the select() returned with an
indication (from the socket layer) that the connection no longer exists.

> Humboldt's log file shows (the time doesn't match, but the log is the same):

Why don't the times match?

> Sep 19 15:18:54 humboldt doppler(feed)[87284]: up6.c:331: Starting 
> Up(6.0.14/6): 20030919140052.871 TS_ENDT {{FSL5,  ".*"}}
> Sep 19 15:18:54 humboldt doppler(feed)[87284]: topo:  doppler.ncdc.noaa.gov 
> FSL5

Humbolt successfully "turned around" the connection and is ready to
write to it.

> Sep 19 15:22:04 humboldt doppler(feed)[87284]: up6.c:168: HEREIS: RPC: 
> 1832-006 Unable to send; errno = There is no process to read data written to 
> a pipe.
> Sep 19 15:22:04 humboldt doppler(feed)[87284]: up6.c:396: Product send 
> failure: There is an input or output error.

And couldn't.

> Sep 19 15:22:34 humboldt doppler(feed)[45626]: up6.c:331: Starting 
> Up(6.0.14/6): 20030919140052.871 TS_ENDT {{FSL5,  ".*"}}
> 
> Alan.

The fact that this used to work indicates to me that something in the
networking layer changed.

Regards,
Steve Emmerson