[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CONDUIT latencies



On Fri, 16 Jul 1999, David J. Knight wrote:
> 
> Hi David,
>      It has been a problem for some time. The feed has
> never been particularly reliable, but, I never bothered
> to follow up on it since we already get much of the
> data we need via ftp. I was just looking into the 
> possibility of phasing the ftp out, so for the last few
> weeks I have been looking into the NMC@ feed more 
> closely. It was not a high priority, but, I would
> like to get it sorted out. I'm not sure if there is
> anything you can do at your end, but perhaps I am
> wrong. 
> Thanks
> David


I've been very frustrated here.   We used to just run our sole LDM
on an ancient HP720, and although it had its share of problems
it was OK most of the time.  Usually problems were something
not directly related to LDM... disk full, network out, etc.

But since it was increasingly having trouble keeping up with the
larger data streams we decided that we had better finally go
and replace it.  Having quite sucessfully used Linux boxes for other 
purposes, and hearing reports of other IDD sites sucessfully using it for
LDM, (and not to mention being attracted by the cheap powerful hardware)
we purchased new Linux boxes for IDD service.

Since then we've had nothing but trouble.  The increase in LDM
performance was nowhere close to what was anticipated given
the increase in hardward power (granted there are other factors
that don't change...like networking, but still, disappointing)

I believe I traced the problem to a memory management issue
that seems to occur when using LDM on Linux with a very
large product queue....it had seriously impacted performance.
Some others have noticed this too...though not everyone...I
believe some of the end nodes running smaller product queues
were below the threshold where it becomes a significant problem.

I believe I've come up with a solution through some changes to the code. 
It really does make the symptoms of the initial problem go away instantly,
but I've been concerned that it isn't causing some other problem so have
been testing it for a while now before declaring that it is a good
solution.

WRT to the NMC2 feed, this problem I mention was certainly a cause of
delays in times past.  Now, though since it runs longer without crashing
I've been experiencing a new problem.  I've noticed very recently that
often I end up with two dozen GEMPAK dcgrib processes running even
though I only have about a half dozen entries in the pqact.conf file.
It seems that they are not going away and run the load very high
causing very high latencies.    This was the case in the last 24
hours.

To be honest, I haven't paid too close attention to the NMC2 feed
in the last while as we mainly use it for interactive analysis
and nobody much is around to do that this time of year.  Since
I'm taking care of everything in our department myself right now
I don't have time to check on each of the 40 or so larger machines
in my care individually every day so I have to depend on others
complaining that something's not right.  And since you didn't
complain and I don't have someone actively using the data this
month, I haven't been working very diligently on this.

Since you have complained I would like to get to the bottom
of these problems once and for all and get something that
works.   

For a test, I have turned off pqact on flood....so for now
there should be no runaway dcgrib processes.  All the machine
has to do is relay NMC2.  It has no other responsibilities.
It is a 400Mhz PII with 256MB RAM, 80GB disk and 100Mbs networking
so it certainly should be capable of this task.  

Please watch the latencies over the next day or two.  I will take
the time to closely do so here.   Since I remade the product
queue just before starting this test, the latencies were
initailly high, but now after a short while they pretty
reasonable.  Will have to wait for the next 12 hour batch
of stuff to know more realistically.

Sorry for the trouble.

--------------------------------------------------------
 David Wojtowicz, Research Programmer/Systems Manager
 Department of Atmospheric Sciences Computer Services
 University of Illinois at Urbana-Champaign
 email: address@hidden  phone: (217)333-8390
--------------------------------------------------------