[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[CONDUIT #MJC-410449]: Re: [conduit] Huge CONDUIT latencies, lost data starting ~ 00 UTC last night



Hi Becky,

First, I really must apologize for not being able to get back to this
before now... too many fires to put out along the way (sigh).

re:
> WOC's initial response is they don't see any system issues why there'd
> be two weeks why folks couldn't access the systems.

The Unidata machine that was REQUESTing from ncepldm1 (daffy.unidata.ucar.edu)
also did not see any products from it until sometime on or about March 1.  The
following snippit from the LDM log file for February 27 is representative of
the inability to get data from ncepldm1:

Feb 27 15:57:59 daffy ncepldm1.woc.noaa.gov[2887] NOTE: LDM-6 desired 
product-class: 20130227215759.431 TS_ENDT {{CONDUIT,  "[27]$"},{NONE,  
"SIG=9ad6bb1074b3fa133683bacfb237d26e"}}
Feb 27 15:57:59 daffy ncepldm1.woc.noaa.gov[2883] NOTE: LDM-6 desired 
product-class: 20130227215759.431 TS_ENDT {{CONDUIT,  "[09]$"},{NONE,  
"SIG=c5af6b0e81a2e2e0f810df1163e6800a"}}
Feb 27 15:57:59 daffy ncepldm1.woc.noaa.gov[2889] NOTE: LDM-6 desired 
product-class: 20130227215759.432 TS_ENDT {{CONDUIT,  "[36]$"},{NONE,  
"SIG=e6f98ef18c5f608c54e9db364f34fd8c"}}
Feb 27 15:57:59 daffy ncepldm1.woc.noaa.gov[2885] NOTE: LDM-6 desired 
product-class: 20130227215759.432 TS_ENDT {{CONDUIT,  "[18]$"},{NONE,  
"SIG=0dc7db8ad50d7dd5b3ab0e7165fe344e"}}
Feb 27 15:57:59 daffy ncepldm1.woc.noaa.gov[2891] NOTE: LDM-6 desired 
product-class: 20130227215759.433 TS_ENDT {{CONDUIT,  "[45]$"},{NONE,  
"SIG=de9cb1827bd50203872e4a2afa8c4fd7"}}
Feb 27 15:57:59 daffy ncepldm1.woc.noaa.gov[2887] ERROR: Disconnecting due to 
LDM failure; Upstream LDM didn't reply to FEEDME request; RPC: Unable to 
receive; errno = Connection reset by peer
Feb 27 15:57:59 daffy ncepldm1.woc.noaa.gov[2883] ERROR: Disconnecting due to 
LDM failure; Upstream LDM didn't reply to FEEDME request; RPC: Unable to 
receive; errno = Connection reset by peer
Feb 27 15:57:59 daffy ncepldm1.woc.noaa.gov[2889] ERROR: Disconnecting due to 
LDM failure; Upstream LDM didn't reply to FEEDME request; RPC: Unable to 
receive; errno = Connection reset by peer
Feb 27 15:57:59 daffy ncepldm1.woc.noaa.gov[2885] ERROR: Disconnecting due to 
LDM failure; Upstream LDM didn't reply to FEEDME request; RPC: Unable to 
receive; errno = Connection reset by peer
Feb 27 15:57:59 daffy ncepldm1.woc.noaa.gov[2891] ERROR: Disconnecting due to 
LDM failure; Upstream LDM didn't reply to FEEDME request; RPC: Unable to 
receive; errno = Connection reset by peer

Comments:

- these errors persisted until late in the day on March 1 and then
  disappeared

- we never noticed the failure since our machine redundantly REQUESTs
  from ncepldm1 and ncepldm4, and the feed from ncepldm4 has been
  working well

- UW/AOS changed their setup to redundantly REQUEST from us 
(idd.unidata.ucar.edu)
  in addition to ncepldm1 and ncepldm4

  I will logon to the UW/AOS machine and see if they are still having problems
  REQUESTing data from ncepldm1 and/or ncepldm4.

re:
> Can you query all the top-level relay folks to see how many of them
> experienced the outage to ncepldm1 for us and let us know?

I think that the fact that two of us in widely distributed locations
(Madison, WI and Boulder, CO) had problems REQUESTing data from ncepldm1
should be sufficient to indicate that there were problems.  The fact
that our problem cleared late in the day on March 1 indicates that
something was done to the LDM and/or network setup for ncepldm1.
I am willing to pursue things in more detail if you think it is
needed, however.

Again, I apologize for taking _so_ long to get to this!!

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: MJC-410449
Department: Support CONDUIT
Priority: Normal
Status: Open