[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[CONDUIT #MJC-410449]: Re: [conduit] Huge CONDUIT latencies, lost data starting ~ 00 UTC last night



Hi Becky,

re:
> Let me jump in here a bit late in the game.  I know Justin was working
> this with the ldm users group, but he's out sick today.

OK.

Question:

- are folks there monitoring posts to both the address@hidden
  and address@hidden email lists?

  I thought that you and others were subscribed to address@hidden
  so you would see posts related to CONDUIT content and other problems.  I
  was have also been working under the assumption that folks there were
  _not_ subscribed to address@hidden mainly since there is a lot
  of chatter on that list that likely has nothing to do with anything you
  can help with.

re:
> What I understand from yesterday was that users weren't getting the SREF
> data.

That is one issue.  There is another that I will re-iterate after replies
to your current list of comments.

re:
> We looked and realized that we'd only done the changes for the
> August 2012 SREF upgrade on the Silver Spring system.  Not Boulder.  So
> first question -- I'm guessing you all were getting the SREF from Silver
> Spring only.  Can anyone give me proof that you stopped being able to
> access Silver Spring in the last week?  And therefore this issue surfaced?

Hmm... this does cover part of the other issue...

We, Unidata Program Center, had not receive any products from
ncepldm1.woc.noaa.gov since 08:51:58 on 20130215; we started
receiving products from ncepldm1.woc.noaa.gov today, however:

Real-time CONDUIT volume statistics for daffy.unidata.ucar.edu:

http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_vol_nc?CONDUIT+daffy.unidata.ucar.edu

The AOS group at the University of Wisconsin at Madison has been
having problems receiving CONDUIT data for the past several days now
both from ncepldm4.woc.noaa.gov (high latencies, overall bad service)
and ncepldm1.woc.noaa.gov (no data; perhaps being denied their feed
REQUEST(s)?).  Things got so bad at UW/AOS that they started feeding
from us (idd.unidata.ucar.edu) this morning.

re:
> Now, we did put the SREF implementation on Boulder last night around 5PM
> our time.

Very good, thanks.

re:
> What issues are you seeing now?

Feed issues to UW/AOS.  Here are some snippits from emails we have received
from Pete Pokrandt yesterday:

  Posted on: 20130228.1045 MST  
  I never got a resolution as to why I am unable to ingest the CONDUIT
  feed on idd.aos.wisc.edu from ncepldm1.woc.noaa.gov since about 9:30 AM
  CST on 2/15/2013. Prior to that date I was able to connect.

  I should be set up to ingest conduit from ncepldm1.woc.noaa.gov and from
  ncepldm4.woc.noaa.gov

  Posted on: 20130228.1137 MST  
  Steve, (cc Tom Yoksas)

  The only contact I have ever had regarding the conduit feed has been
  through the Unidata folks - Steve Chiswell in the olden days <TM>, and
  Tom Yoksas more recently. I have never had any direct contact between
  myself and noaa.

  I have attached the 10/15/2010 email that asked me to switch my ingest
  of the conduit feed to the redundant servers of ncepldm1.woc.noaa.gov
  and ncep4.woc.noaa.gov. This has been working since then, up until
  2/15/2013.

  There is an email for a Rebecca Cosgrove at Noaa - is she still our
  contact for the CONDUIT ldm servers? Tom, should I contact her or
  someone direct or should this go through you?

My Comment: I was out of the office yesterday and not available yesterday
evening, so I was not around to respond to Pete's inquiries.

  Posted on: 20130228.1206 MST  
  I am currently feeding CONDUIT from ncepldm4, that has always worked.
  The problem is there is no redundancy now. If ncepldm4 drops offline, we
  have no CONDUIT data.

  It is good to know that you also are unable to receive conduit data from
  ncepldm1. Sounds like maybe the ldm on ncepldm1 went down?

My Comment: Pete's post does not reflect his previous comments about
very high latencies while feeding from ncepldm4.woc.noaa.gov.

  Posted on: 20130301.0850 MST  
  All,

  We are losing lots of CONDUIT data, huge latencies beginning near 00 UTC
  or so last night.

  I don't think it is just us because the problem shows up on other sites
  as well. I have attached two latency plots - unfortunately most of the
  time the begin/end times aren't working on these plots, but I did look
  at them yesterday and the latencies had not begun yet, so the big
  increase began sometime late yesterday. Did something change?

  My users reported lost data beginning with the 00 UTC model cycle.

  Also, I still am unable to connect to ncepldm1.woc.noaa.gov.

  Posted on: 20130301.0852 MST  
  By the way, I just began requesting CONDUIT also from
  idd.unidata.ucar.edu since they appear to have a connection with lower
  latencies.

re:
> so I was hoping we'd have some reports from you guys of what
> problems you're seeing

The two issues seem to be/have been:

- lack of SREF data

- set of ALLOWs on CONDUIT toplevel injection machines is not uniform

re:
> So... are you all having problems accessing some or all of the CONDUIT
> boxes?  If so, since when?

We are getting data from ncepldm1 again (as per real-time CONDUIT volume
plot URL I included above).  UW/AOS (idd.aos.wisc.edu) may still not be
able to REQUEST data from ncepldm1.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: MJC-410449
Department: Support CONDUIT
Priority: Normal
Status: Closed