[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #XTT-958504]: LDM failing over to another site



Hi Gregg,

[This is such a great question that I've moved it into our support tracking
system so that others can view it.]

> I have a question on the inner workings of LDM and when/how LDM will obtain
> a "new" IP address for an upstream provider.

Piece of cake.

> First some background.  In particular NWS NCO has a LDM feed of MRMS data
> from:
> 
> mrms-ldmout.ncep.noaa.gov
> 
> MRMS runs in College Park and Boulder, but only ONE site is considered
> optional at a time, the canonical names for these two LDM feeds are:
> 
> mrms-ldmout.cprk.ncep.noaa.gov  140.90.98.15   (College Park)
> mrms-ldmout.bldr.ncep.noaa.gov  140.172.138.50 (Boulder)
> 
> LDM requests downstream LDM users to have a feed from
> mrms-ldmout.ncep.noaa.gov
> and then NCO will update the DNS entry so this host points to the actual
> LDM server (mrms-ldmout.cprk... , and mrms-ldmout.bldr...).
> 
> This morning NCO changed the DNS entry around 1330Z so
> mrms-ldmout.ncep.noaa.gov points to the Boulder site.
> 
> I was watching DNS and within 5 minutes of their change the IP address
> updated for the Boulder location.  My ldmadmin watch showed data flowing.
> I wasn't sure though where LDM was connected right after the failover (now
> I know to check netstat and I can find out).
> 
> What I believe happened this morning is SPC continued to get data from
> College Park until ~1520Z when data stopped,

Yup.

> and then at 1601Z we received
> a slug of data back filling to 1522Z and catching up.  From looking at the
> ldmd.log file it appears the LDM was stopped at College Park at 1601Z and
> then LDM resolved to the new IP address for College Park.

Yup.

>  I searched the
> log file for an earlier "died" entry and didn't find one.

You wouldn't unless you were logging at the INFO (i.e., verbose) level. Then
you'd see that the downstream LDM connected to the new site.

> Is it correct LDM has an active connection so it doesn't try checking to
> see if the IP address has changed?

Yup.

> Would a best practice for a data provider be if they are going to fail-over
> to another site to also stop LDM at the site they are no longer going to
> have as "LIVE"/"OPERATIONAL"?

Bingo!

> I'm curious what insights and suggestions you have regarding the inner
> workings when a upstream location changes and how to minimize data
> outages/delays.

1. Change the DNS entry for the original LDM.
2. Wait for it to propagate (the "expire" time is under your control).
3. Stop the original LDM.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: XTT-958504
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.