[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: network problems at UCAR?



On Fri, 22 Oct 1999, David J. Knight wrote:

> Hi Rob,
> > 
> > All the top tier sites have at least 2 failover sites.
> 
> good. I thought there were failovers available but didn't know 
> exactly where.
> > 
> > Also, it is better to check thelma verses checking desi because desi could
> > be down and thelma could be receiving data from either ssec or alden and
> > nobody would notice. 
> 
> Actually we would know, since the originating machine is included
> in the products, and logged in the stats file. For example
> in the 16Z stat files for today I see:
> 
> 19991022160717 IDS|DDPLUS           dusk.ssec.wisc.edu        548     356493
>   46.71  125@0000 19991022160724
> 19991022160712 IDS|DDPLUS    noaaport.unidata.ucar.edu        218     134609
>   43.52  123@0006 19991022160716
> 19991022160914 IDS|DDPLUS              nport.alden.com        256     247246
>    0.32   11@0158 19991022160716
> 
> It appears that products are coming from 3 seperate places. Presumably
> the first one here (or more likely the first one at the upstream
> relay) is that one that gets processed by our LDM.
> My LDM is only pointing to Cornell upstream. I don't know how
> several site originators appear. Presumably somebody upstream
> is point at all three sources, or switching between them.


Hiya,

thelma gets feed from the noaaport.unidata and dusk.ssec.  dusk.ssec gets
feed from nport.alden.  thelma feeds most top tier nodes, so most sites
have 3 entries for the NOAAport feeds. This was done so any source site is
not dependant on their satellite system for data. Surprizing, 2/3 of
the data comes from noaaport.unidata, 2/9 from dusk.ssec and 1/9 from
nport.alden.  This was implememnted back in Jan.  


> 
> > UCAR had internal network problems yesterday, I
> > don't know what was the affect on thelma.
> 
> I don't know for sure either, but all of the ncar sites I tried
> to reach were dropping the majority of packages. Packets seemed to
> get to the ncar gateway fine and were lost internally. I assume
> thelma was effected the same way.
> 
> > desi did feed thelma all day
> > yesterday without any problems.
> > 
> 
> I think this points out a weakness in the failover scripts.
> All the scripts I am aware of (and certainly the one we use)
> only checks to see if *any* data is flowing at all. Yesterday
> *some* data was flowing, but because of dropped packets and presumably
> resulting retransmissions the latencies were over 60 minutes so
> most of the data was being lost. (i'm sure if you check yesterdays
> logs you will see when this problem occured). It would be nice if
> a failover method would check from which site data could be obtained
> with the least latency and most reliability and use that feed site.
> 
The automatic failover mechanism is going to be built into a future
release of the LDM, don't know when it will happen.  For now , if one runs
the ldmprods script they would be able to tell the current feeds status. 
There are notification flags on ldmprods.


> I wonder if an alternate feed method would make sense. Perhaps
> the top level relays could request their feed from two (or more)
> upstream sites. Presumable only one copy of each product would
> get processed. This way everybody would get their data the fastest
> possible way.

We are probably talking about increasing network congestion, then
increased latencies especially for the downstream nodes.


 In fact, perhaps all sites should request their data
> from two upstream sites. This is so obvious that I'm sure it has
> been tried, or, there is an obvious problem with it that I have
> missed. Just a thought...

Again increase network congestion.  Also the LDM product duplication takes
place at the remote machine verses the upstream machine. ie The product is
received before the duplication check is performed. 


> (I just checked 
> http://www.unidata.ucar.edu/projects/idd/status/idd/fosTopo.htmland it seems 
> some sites are doing this, or, perhaps the topology information
> is not being updated as sites change their data source)
> 

I know that, the problem is that the information has to be persistant
because it doesn't arrive every hour.  The log message is only produced
when a new connection occurs. There needs to be a message every hour to
make the topo charts current. This is another enhancement needed for the
LDM.

Robb...

> David
> 
> David Knight
> Department of Earth and Atmospheric Sciences   Tel: (518)-442-4204
> SUNYA   ES-228                                 Fax: (518)-442-4494
> Albany, NY  12222                              Email: address@hidden
> 

===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================