[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [conduit] Conduit Outage 4/15 into 4/16



Good evening all,

I am pleased to relay to you all a notice I have received from NCO:

NOUS42 KWNO 240259
ADASDM

SENIOR DUTY METEOROLOGIST NWS ALERT ADMINISTRATIVE MESSAGE
NWS NCEP CENTRAL OPERATIONS COLLEGE PARK MD
0258Z WED APR 24 2024

...CP DATA CENTER RECOVERY - RESTORED...

Recovery teams have restored remaining degraded services/systems
in the College Park Data Center aside from CPC which has to back
fill historical data.  CP Data Center is currently a viable
backup.

Shruell/SDM/NCO/NCEP

It looks like data began flowing in full again on the 19th as shown in the attached graphics.  I suppose that makes the subject of this email thread wildly inaccurate now...  /shrug

You may have noticed I held off on another email until now. I saw that data was starting to come in again on Friday, but as I noted in my last exchange I didn't have much evidence to suggest it would stay online or in full.  Part of that was because I didn't have anything new to report, and part of THAT was because the NCO status page had stalled from Thursday until today (https://www.nco.ncep.noaa.gov/status/messages/).  Other sources of that information were still updating, but the fact that this page was still out made it clear restoration efforts were still in progress.  At the same time, as you may have seen on the above link, the Critical Weather Day in support of the restoration efforts was allowed to expire at 00Z tonight.  Leading up to that, the fact they were holding that deadline suggested they were feeling good about how things were going.  But apart from noting various details and conjecture (gestures as all of That...), I didn't feel I had anything worthy to report.  This one probably is though.

All that is a long winded way to say NCEP has sounded the All Clear, and the College Park DC is back in full.  If you notice anything not quite right or you suspect there may still be missing data, please don't hesitate to reach out to support-idd@unidata.ucar.edu, support-conduit@unidata.ucar.edu, or support-ldm@unidata.ucar.edu, whichever is most appropriate. 

...And there was much rejoicing.
-Mike

Mike Zuranski
Data Engineer II
NSF Unidata Program Center
University Corporation for Atmospheric Research


P.S.:
Here are those graphs illustrating the down/uptime of CONDUIT (sorry if anyone still uses alpine).
  
image.png
image.png
image.png



On Fri, Apr 19, 2024 at 3:15 PM Mike Zuranski <address@hidden> wrote:
Good afternoon everyone,

There hasn't been an update message from NCEP since yesterday morning, but as of this writing here's where things stand:

Data has been flowing on CONDUIT for a few hours now, though I do not know if it's everything or if it'll stay online.  It came back up shortly before the 12Z GFS and currently the 18Z NAM is coming in.  I'll try not to look at it the wrong way.

On Wednesday morning a Critical Weather Day was issued to help ensure NCEP et al. have all the resources needed for the recovery efforts.  Originally the CWD was scheduled to end Saturday morning, but earlier today it was extended to Monday evening.  My read is that says something about their confidence level, and there's still a chance CONDUIT and other impacted services could still go up and down; an all clear would be premature.

Model data continues to be available on nomads.ncep.noaa.gov and other missing data may exist in other locations too, please reach out if you need help finding data.

Here is the latest ADASDM update on the restoration efforts:

Here is the latest CWD statement:
https://mesonet.agron.iastate.edu/wx/afos/p.php?pil=ADASDM&e=202404191520

And here is that CONDUIT RTSTATS graph, showing showing there's life on the feed again:

We will continue the monitor and I'll update these lists as more information comes out, and if nothing else I'll send another update Monday.

Best,
-Mike


Mike Zuranski
Data Engineer II
NSF Unidata Program Center
University Corporation for Atmospheric Research


On Wed, Apr 17, 2024 at 9:47 AM Mike Zuranski <address@hidden> wrote:
Good morning all,

The situation doesn't seem to have changed much from last night.  CONDUIT is still down, numerous impacts to NCEP web sites and services, cats & dogs living together...

We continue to monitor but that's about all we can do.  The link I'm smashing the F5 button on is this rtstats chart, it's probably the easiest "Is it still down?" check at the time for CONDUIT:
https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_num_nc?CONDUIT+conduit.unidata.ucar.edu

If you are having a hard time finding model data that is normally on CONDUIT, nomads.ncep.noaa.gov has remained unaffected by all of this.  Also, please reach out if you need assistance finding other sources of data or anything else.

Below is a notice that sums things up nicely; this and other notices can be found at https://www.nco.ncep.noaa.gov/status/messages/ (luckily that site has come back).  We will update with any changes to the situation.

SENIOR DUTY METEOROLOGIST NWS ADMINISTRATIVE MESSAGE
NWS NCEP CENTRAL OPERATIONS COLLEGE PARK MD
1019Z WED APR 17 2024


...UPDATES TO RECENT NWS OPERATIONAL OUTAGES...


WIDESPREAD WFO NETWORK OUTAGES...
WFOs internet and AWIPS connections have remained stable since
the circuit was restored in College Park Tuesday morning. NCO's
network team will continue to work towards mitigating impacts
during the recurring circuit outages in College Park.


NWS BROADCAST REPEATED MESSAGES...
The problem with NWS products being broadcasted multiple times
was traced to Monday's efforts to mitigate impacts from the
temperature spike in the College Park Data Center. NCO
implemented a fix to correct the issue at 2:00pm EDT Tuesday.


MRMS...
CONUS QPE data continues to not update on MRMS
(https://mrms.ncep.noaa.gov/data/). The problem has been linked
to the College Park Data Center Outage. Efforts to restore the
data will resume early Wednesday.


RECOVERY EFFORTS IN THE COLLEGE PARK DATA CENTER...
No significant updates during overnight restoration efforts. NCEP
Center's (OPC, CPC, and WPC) operations remain severely degraded
due to downed NetApp systems in College Park. No ETR.

Current Known Impacts include:
-NCEP Centers' websites hosted in CP that remain inaccessible
include EMC and NCWCP intranet sites.
-WPC, OPC, and CPC's operational product suites' status, range
from being degraded to down.
-FTPPRD is inaccessible in CP (Customers are able to use
nomads.ncep.noaa.gov as a viable backup in the meantime)
-NCO operations personnel are unable monitor NWS networks and
circuits.
-CONUS QPE data is not updating on MRMS
(https://mrms.ncep.noaa.gov/data/)
-Several layers are not updating on NWS Cloud Services (GIS and
Map Viewer)
-Multiple outside datasets are not available/delayed (UKMET data,
ECMWF data, Canadian METARS, ACARS aircraft data)




Gerhardt/SDM/NCO/NCEP



Best,
-Mike


Mike Zuranski
Data Engineer II
NSF Unidata Program Center
University Corporation for Atmospheric Research


On Tue, Apr 16, 2024 at 10:05 PM Mike Zuranski <address@hidden> wrote:
The time is now 03:05 UTC, do you know where your data is?

At this time CONDUIT is still down.  I had an exchange with NCEP Ops earlier this evening so I know they were still working on it then.  They are fully aware of the breadth of the situation and are working with the applicable parties to resolve this as soon as possible.  Given that I do not plan on reaching out to them again on this unless the symptoms change.

I haven't seen any admin notices or similar pertaining to this, but we are watching closely.  I'll keep you posted with any updates I hear, and I plan on making another status update tomorrow morning.

In the meantime, the data that's missing on CONDUIT may well be found at nomads.ncep.noaa.gov, which has remained unaffected during these troubling times.  Feel free to reach out if we can help you find another source of data or anything else.

Best,
-Mike


Mike Zuranski
Data Engineer II
NSF Unidata Program Center
University Corporation for Atmospheric Research


On Tue, Apr 16, 2024 at 10:52 AM Mike Zuranski <address@hidden> wrote:
Greetings all,

It appears the CONDUIT feed has been down between yesterday afternoon and just moments ago began to transmit data again.  It looks like this is being actively worked on but I'm keeping a close eye on the situation.  

While I haven't seen any notifications on the subject, the source of the outage is upstream from us so we don't have much control over it.  If data stops flowing again I'll reach out to the appropriate parties.

Sorry for any inconvenience,
-Mike


Mike Zuranski
Data Engineer II
NSF Unidata Program Center
University Corporation for Atmospheric Research
_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web.  Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.


conduit mailing list
address@hidden
For list information or to unsubscribe, visit: 
https://www.unidata.ucar.edu/mailing_lists/