[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IDD #VOV-832074]: Lost NEXRAD2 feed from idd.unidata.ucar.edu



Hi John,

re:
> Well, I have to say this is the first time I've ever heard of you being
> stumped.

Well, it wasn't be the first time and I am sure it won't be the last.

The good news from my perspective is that we now fully understand
what happened and have taken corrective action.  More on this below...

> I can't take credit though - I only noticed the problem!

It turns out that others noticed strange things in the NEXRAD2 feed.
There were a number of emails sent to the address@hidden
email list yesterday on problems being seen.  I don't think you are
subscribed to the ldm-users email list.  You might want to be if
you want to know about what other's are seeing in the various feeds.
Also, if you want to post to the list, you need to be subscribed (helps
keep spam out of the list).  You can (un)subscribe to any email list
that we maintain through:

Unidata HomePage
http://www.unidata.ucar.edu

  Support
  http://www.unidata.ucar.edu/support

    Participate in topical mailing lists
    http://www.unidata.ucar.edu/support/mailinglist/mailing-list-form.html

> From my logs, feeding against idd.unidata, here is the last history I
> have on the problem:
> 
> Lost feed at Jul 2, 21:38Z.
> Feed restored Jul 3, 10:43Z.
> Lost feed Jul 3, 18:17Z.
> Feed restored Jul 4, 01:35Z.
> 
> I have probed random sites in every NWS region via notifyme during the
> outages and confirmed it seems to be limited to the Eastern Region, of
> which 100% of the sites drop out as a block.

We finally figured out what happened late yesterday evening.  As you might
expect the problems end users saw was a combination of a couple of problems:

- Purdue is one of the toplevel relay nodes for the NEXRAD Level II
  data.  For some reason, their machine was (perhaps still is) missing
  the Eastern Region Nexrads.  This still needs to be investigated, of
  course, but it is a situation that is not in our control.

- as I mentioned in an email yesterday, one of the real servers in
  the idd.cise-nsf.gov cluster has been shutting itself off due
  to overheating.  What I didn't know about was that a different
  machine had been put into service in replacement for the machine
  that was turning itself off.  Unfortunately, that machine had
  not been setup to ingest the NEXRAD2 data.  So, sites like you
  (AND idd.unidata.ucar.edu) were being connected to that machine
  and so were not redundantly receiving Level II data.  This left
  Purdue as the sole input for Level II data into idd.unidata.ucar.edu,
  and as I said above, it was missing the Eastern Region stations.

Mike Schmidt found and fixed the ingest problem on the replacement
machine in the idd.cise-nsf.gov cluster yesterday evening.  The
result of a lengthy conversation we had yesterday evening
was the setting up of a third redundant NEXRAD2 feed from the IRaDS
group at the University of Oklahoma.  (IRaDS, Purdue, the ERC, and
the MAX GigaPop at the University of Maryland are the 4 toplevel
relay nodes for Level II data).  idd.cise-nsf.gov feeds directly
off of the MAX, and idd.unidata.ucar.edu directly feeds off of
Purdue.  We had figured that this two-way redundancy was sufficient,
but yesterday's episode demonstrated that it was not, hence the
three way redundancy now in place, and the eventual 4-way redundancy
we will implement was soon as we get the ERC to allow our feed
requests.

As you might guess, this was pretty much a "comedy of errors".  It
is now known as "a learning experience" :-)

> I will try to keep an eye on things from my end and let you know if/when
> it drops out again.

Thanks.  We really rely on users keeping a close eye on the feeds
that they are interested in.  We have a number of monitors, but
they are typically gross checks for machines being down, etc..

> Please have a joyous 4th!

You too.  Mine will be a lot more fun now that we understand how
the problem was caused and have taken corrective action.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: VOV-832074
Department: Support IDD
Priority: Normal
Status: Closed