Re: [conduit] Huge CONDUIT latencies, lost data starting ~ 00 UTC last night

Folks,
My apologies... I meant to first work this issue with my colleagues at Unidata to track down the issues. I got my CONDUIT email addresses confused :)

We'll try to get this issue resolved as soon as we can.
Becky Cosgrove
NCEP Central Operations

On 3/1/2013 10:58 AM, Rebecca Cosgrove wrote:
Hi Unidata folks.
Let me jump in here a bit late in the game. I know Justin was working this with the ldm users group, but he's out sick today.

What I understand from yesterday was that users weren't getting the SREF data. We looked and realized that we'd only done the changes for the August 2012 SREF upgrade on the Silver Spring system. Not Boulder. So first question -- I'm guessing you all were getting the SREF from Silver Spring only. Can anyone give me proof that you stopped being able to access Silver Spring in the last week? And therefore this issue surfaced?

Now, we did put the SREF implementation on Boulder last night around 5PM our time.

What issues are you seeing now? We haven't contacted the WOC yet because frankly, the working relationship with the WOC isn't what it once was, so I was hoping we'd have some reports from you guys of what problems you're seeing before we went to the WOC to explain them to us.

So... are you all having problems accessing some or all of the CONDUIT boxes? If so, since when?

Thanks.
Becky

On 3/1/2013 10:50 AM, Pete Pokrandt wrote:
All,

We are losing lots of CONDUIT data, huge latencies beginning near 00 UTC or so last night.

I don't think it is just us because the problem shows up on other sites as well. I have attached two latency plots - unfortunately most of the time the begin/end times aren't working on these plots, but I did look at them yesterday and the latencies had not begun yet, so the big increase began sometime late yesterday. Did something change?

My users reported lost data beginning with the 00 UTC model cycle.

Also, I still am unable to connect to ncepldm1.woc.noaa.gov.

reset by peer
Mar 1 09:48:02 idd ncepldm1.woc.noaa.gov[32530] NOTE: LDM-6 desired product-class: 20130301144802.003 TS_ENDT {{CONDUIT, "[36]$"},{NONE, "SIG=4f055ca8e0b1dcd25e2fce2c5ace532d"}} Mar 1 09:48:02 idd ncepldm1.woc.noaa.gov[32528] NOTE: LDM-6 desired product-class: 20130301144802.004 TS_ENDT {{CONDUIT, "[27]$"},{NONE, "SIG=1e15f4a14a5ba32b3754adc5a4a2f3b7"}} Mar 1 09:48:02 idd ncepldm1.woc.noaa.gov[32530] ERROR: Disconnecting due to LDM failure; nullproc_6 failure to ncepldm1.woc.noaa.gov; RPC: Unable to receive; errno = Connection reset by peer Mar 1 09:48:02 idd ncepldm1.woc.noaa.gov[32528] ERROR: Disconnecting due to LDM failure; nullproc_6 failure to ncepldm1.woc.noaa.gov; RPC: Unable to receive; errno = Connection reset by peer Mar 1 09:48:07 idd ncepldm1.woc.noaa.gov[32524] NOTE: LDM-6 desired product-class: 20130301144807.768 TS_ENDT {{CONDUIT, "[09]$"},{NONE, "SIG=a7b1576bebab6351a83b216b84845599"}} Mar 1 09:48:07 idd ncepldm1.woc.noaa.gov[32532] NOTE: LDM-6 desired product-class: 20130301144807.768 TS_ENDT {{CONDUIT, "[45]$"},{NONE, "SIG=d083b07055c1f728597b31bd55c7c05b"}} Mar 1 09:48:07 idd ncepldm1.woc.noaa.gov[32532] ERROR: Disconnecting due to LDM failure; nullproc_6 failure to ncepldm1.woc.noaa.gov; RPC: Unable to receive; errno = Connection reset by peer Mar 1 09:48:07 idd ncepldm1.woc.noaa.gov[32524] ERROR: Disconnecting due to LDM failure; nullproc_6 failure to ncepldm1.woc.noaa.gov; RPC: Unable to receive; errno = Connection reset by peer Mar 1 09:48:27 idd ncepldm1.woc.noaa.gov[32526] NOTE: LDM-6 desired product-class: 20130301144827.907 TS_ENDT {{CONDUIT, "[18]$"},{NONE, "SIG=d258c505f9e3949291d1fef3eb21a095"}} Mar 1 09:48:27 idd ncepldm1.woc.noaa.gov[32526] ERROR: Disconnecting due to LDM failure; nullproc_6 failure to ncepldm1.woc.noaa.gov; RPC: Unable to receive; errno = Connection reset by peer Mar 1 09:48:32 idd ncepldm1.woc.noaa.gov[32530] NOTE: LDM-6 desired product-class: 20130301144832.058 TS_ENDT {{CONDUIT, "[36]$"},{NONE, "SIG=4f055ca8e0b1dcd25e2fce2c5ace532d"}} Mar 1 09:48:32 idd ncepldm1.woc.noaa.gov[32528] NOTE: LDM-6 desired product-class: 20130301144832.059 TS_ENDT {{CONDUIT, "[27]$"},{NONE, "SIG=1e15f4a14a5ba32b3754adc5a4a2f3b7"}} Mar 1 09:48:32 idd ncepldm1.woc.noaa.gov[32530] ERROR: Disconnecting due to LDM failure; nullproc_6 failure to ncepldm1.woc.noaa.gov; RPC: Unable to receive; errno = Connection reset by peer Mar 1 09:48:32 idd ncepldm1.woc.noaa.gov[32528] ERROR: Disconnecting due to LDM failure; nullproc_6 failure to ncepldm1.woc.noaa.gov; RPC: Unable to receive; errno = Connection reset by peer

I have a meeting at 16 UTC (10AM CST) so I will be out of contact for a bit.

Thanks,

Pete



_______________________________________________
conduit mailing list
conduit@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:http://www.unidata.ucar.edu/mailing_lists/