Re: [conduit] Large CONDUIT lag from some top levels began after 12 UTC May 16

  • To: Carissa Klemmer - NOAA Federal <carissa.l.klemmer@xxxxxxxx>
  • Subject: Re: [conduit] Large CONDUIT lag from some top levels began after 12 UTC May 16
  • From: Pete Pokrandt <poker@xxxxxxxxxxxx>
  • Date: Mon, 18 May 2015 12:26:41 -0500
Carissa,

Thanks for the update.

I also received an email from the NOAA systems support group suggesting they are aware of, and investigating the issue:

----
Subject: ESRL/GSD model data outages, starting 18Z 5/16/2015

NOAA/ESRL/GSD Data Users,

We have been experiencing degraded download rates from the
NCEP ftp server to ESRL/GSD since about 18Z on Saturday 5/16.
As a result downloads of critical data sets needed as inputs
to GSD models have been failing, leading to outages or spotty
coverage for a number of experimental models run at GSD,
including HRRR, RAP, and FIM.

This appears to be a broader network issue than just between
NOAA/ESRL and NCEP. The Boulder Network Operations Center
has been notified, and is investigating.

----

Also I was contacted by the Byrd Polar and Climate Research center at OSU asking us about the lag on CONDUIT data that they are feeding from us. Their AMPS model isn't starting, presumably due to the delays or missing data.

Pete




On 05/18/2015 12:19 PM, Carissa Klemmer - NOAA Federal wrote:
Pete,

Starting May 16th both UCAR and GSD have also reported awful transfer rates to many of our servers out here. The timing lines up that I am suspicious that you are caught in whatever path is also affecting them from Boulder. We have engaged the NOAA campus to investigate the possible issue.

Carissa Klemmer
NCEP Central Operations
Production Management Branch Dataflow Team
301-683-3835

On Mon, May 18, 2015 at 12:21 PM, Pete Pokrandt <poker@xxxxxxxxxxxx <mailto:poker@xxxxxxxxxxxx>> wrote:

    Just a heads-up/curiosity - did something change on one of the top
    level CONDUIT servers after the 12 UTC run on May 16, 2015?

    The attached gif, although it does not have times on the bottom,
    was grabbed around 01 UTC May 18th. It shows my lag from
    idd.aos.wisc.edu <http://idd.aos.wisc.edu> to the two top level
    CONDUIT servers (conduit.ncep.noaa.gov
    <http://conduit.ncep.noaa.gov> and ncepldm4.woc.noaa.gov
    <http://ncepldm4.woc.noaa.gov>)

    Time on the X axis is positive to the right, and starts about 2
    days ago on the left, ending at the picture time on the right.

    Each of the 8 clusters of data along the X axis is one of the 6
    hourly model cycles, so the farthest right one would be the 18 UTC
    17 run cycle. Working to the left, you get 12, 06, 00, and 18 UTC
    16 runs where the lag from the machines represented in blue and
    green got really large (2000-3000 seconds) all of a sudden. The
    previous three cycles (12, 06, and 00 UTC May 16) have lags no
    larger than ~30-60 seconds, as they have been for the past several
    weeks.

    I don't appear to be losing any data because of this, but if
    something did change, I wanted to point out that I noticed this
    sudden increase in lag times. Looking at graphs from a few
    selected other conduit sites, some show a similar issue(eg
    atm.ucar.edu <http://atm.ucar.edu>) and others don't seem to
    (cyclone.plymouth.edu <http://cyclone.plymouth.edu>), so maybe
    it's a routing change?

    I did just talk with Jerry Robaidek at SSEC here at UW, and he
    told me that they started experiencing much slower data rates on
    their pull of data from the Japanese Himawari satellite around 20
    UTC on Saturday May 16 also.. Interesting.. Points to maybe a
    routing issue somewhere?

    Pete


-- Pete Pokrandt - Systems Programmer
    UW-Madison Dept of Atmospheric and Oceanic Sciences
    608-262-3086 <tel:608-262-3086> - poker@xxxxxxxxxxxx
    <mailto:poker@xxxxxxxxxxxx>




--
Pete Pokrandt - Systems Programmer
UW-Madison Dept of Atmospheric and Oceanic Sciences
608-262-3086  - poker@xxxxxxxxxxxx