[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20040623: LDM Failover Configuration



Kathy,

>Date: Tue, 22 Jun 2004 16:57:43 -0400
>From: Kathy Carusone <address@hidden>
>Organization: MIT Lincoln Laboratory
>To: Steve Emmerson <address@hidden>
>Subject: Re: LDM Failover Configuration
>Keywords: 200406141537.i5EFbZtK014545

The above message contained the following:

> Sorry for my delayed reply, but your email got "lost".

We're poor little lambs who have gone astray, ...

Never mind.  :-)

> What I mean by "hard switch" is just manually changing the config file 
> from one to another. (As opposed to software automatic failover).

Manually switching the LDM system from one configuration file to another
(presumably requesting data from different upstream hosts) requires that
someone monitor data reception relatively continuously (e.g. every 15
minutes, 24 by 7).  This is relatively labor-intensive.  Can you do this?

> We may have enough bandwidth to ingest double feeds, but we share our 
> incoming Internet2 feed with the rest of the laboratory, so we do not 
> have dedicated bandwidth set aside for our NEXRAD ingest. We could try 
> it though.

That would increase reliability considerably.  If "p" is the probability
that one upstream LDM is off-line, then the probability that two
upstream LDM-s are simultaneously off-line is approximately p-squared.

> How would we set it up to get duplicates (is it primary and 
> secondary request lines?)

One could set-up both PRIMARY and SECONDARY requests for the same
data-products.  One of the disadvantages, however, is that the SECONDARY
connection will always use the synchronous LDM-5 protocol.  This
protocol has SIGNIFICANTLY lower performance than the asynchronous LDM-6
protocol.

This is why I recommended two PRIMARY requests.  The receiving LDM
system will ignore all the duplicate product arrivals.  The disadvantage
is that twice the bandwidth is used.

With two PRIMARY requests, you probably won't need to failover in order
to have a system that's more reliable that than Ivory soap is pure
(99.44%).  If both upstream LDM-s are off-line, then it's likely that
either the network is down or a multi-state power-outage has occurred.
In either case, failing-over to another upstream LDM probably won't
help much.

> and if we were not to get duplicate feeds, how 
> would you recommend we handle our failover?

One could write a script that ldmping(1)ed the upstream LDM every, say,
5 minutes and combine it with a heuristic to decide if the upstream
LDM was off-line.  The script could then replace the LDM
configuration-file with a failover version and restart the LDM.

I believe the perl(1)-script ldmfail(1) does this.  This script is in
the LDM distribution.  I not a perl(1) expert, however, and I haven't
studied this script.  You might have a look at it, however.

> We are still waiting for our IT group to clear new holes in our firewall 
> before testing.. Thanks for your help.

Tell the IT guys that, to date, there have been no reports of a
successful break-in using port 388.  Nada, zilch, none.  And the LDM has
been in operational use for ten years.

Regards,
Steve Emmerson