[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20020208: LDM resources under Linux



Hi David,

David Wojtowicz wrote:
> 
> Hi Anne,
> 
>   Actually, I was reporting a problem primarily with flood.atmos.uiuc.edu
> which serves NMC2 and NNEXRAD|FNEXRAD.  squall.atmos.uiuc.edu does
> everything else.  To answer some of your questions...
> 

Excuse me for confusing flood with squall.

> The load on flood can range from 3.0 to 10.0+  depending on the time
> of day.  There are presently 22 rpc.ldmd's running.  If you look at
> top there are usually 5-6 in the "runable" (R) state, rather than
> sleeping.  That means this many of them are awaiting service by the
> processor.
> 

This does indicate that the CPU isn't fast enough for the load. 

> There are no topology listings of NNEXRAD or NMC2 like there are with
> MCIDAS or FOS so I'm not sure who all is actually using us at the moment
> since not, everyone who has an allow line is using it at any given time.
> 

Here are a few command lines to give you a close approximation of who is
currently feeding from you.  In the logs directory, do:

        grep feed ldmd.log | grep Exit | awk '{ print $5 }' | sort > exited
        grep feed ldmd.log | awk '{ print $5 }' | sort | uniq > feeding
        diff feeding exited

The diff will report the processes that are still feeding from you.

Here is similar info (cleaned up) from the stats you are sending us:

TOPOLOGY
flood.atmos.uiuc.edu redwood.atmos.albany.edu NMC2
flood.atmos.uiuc.edu ice.atmos.uiuc.edu NMC2
flood.atmos.uiuc.edu cirrus.atmos.uiuc.edu NNEXRAD|FNEXRAD|NMC2
flood.atmos.uiuc.edu measol.meas.ncsu.edu NNEXRAD|FNEXRAD
flood.atmos.uiuc.edu waterspout.cst.cmich.edu NNEXRAD|FNEXRAD
flood.atmos.uiuc.edu anvil.eas.purdue.edu NNEXRAD
flood.atmos.uiuc.edu yin.engin.umich.edu NNEXRAD
flood.atmos.uiuc.edu ldmdata.sws.uiuc.edu NNEXRAD
flood.atmos.uiuc.edu twister.sbs.ohio-state.edu NNEXRAD|FNEXRAD
flood.atmos.uiuc.edu aeolus.valpo.edu NNEXRAD
flood.atmos.uiuc.edu zelgadis.geol.iastate.edu NNEXRAD
flood.atmos.uiuc.edu squall.atmos.uiuc.edu NNEXRAD|FNEXRAD
flood.atmos.uiuc.edu papagayo.unl.edu FNEXRAD
flood.atmos.uiuc.edu data2.atmos.uiuc.edu FNEXRAD
TOPOEND

Looks like you are feeding CONDUIT to two local sites, as well as
Albany.  

It would be interesting to know how much NEXRAD data your downstream
sites are requesting.   We can get some sense of that here from an
analysis of the incoming stats.  You can also grep through your logs for
'Start' or 'RECLASS' to see what feed types and patterns sites are
requesting (but if the connection has been stable and products timely
for a while you might not see any of these).


> I'm guessing that it is the bottleneck, especially at the very busy
> 12Z run distibution via CONDUIT.
> 
> I'm concerned both that we are not servicing our downstream sites properly
> and are losing CONDUIT products ourselves when the latencies exceed 1hr.
> 


I can see that redwood is only getting a small percent of all the
CONDUIT products you're getting (are they asking for less?).  And, the
delay between your site and redwood's site is significant - I'm seeing
10 to 15 minutes.

Regarding NEXRAD, both umich and purdue are timely, although they must
not be requesting a lot.


> Again, the machine does only relay and nothing else (no pqact or
> other significant processes)  It is a 400Mhz PC machine running Linux.
> Granted, this is "slow" now, but was state of the art when we bought it
> for CONDUIT about two years ago.   So I wondering what specs I need to
> be looking for in a replacement or if that would even help if the machine
> is not the bottleneck.
> 
> Thanks.
> 
> --david
> 

Yes, your machine seems overloaded.  And I suspect it is introducing
further latencies into the CONDUIT feed.  But, you are receiving and
relaying *a lot* of data.

Regarding a replacement, for a "relay only" machine I would first
consider CPU speed and RAM.  Having enough RAM to hold a sufficient
queue is important.  Disk speed is important for the "read/write
through" aspect of the queue, so a fast disk certainly wouldn't hurt.  

For CONDUIT data, the biggest hour is currently 1.2Gbytes.  NEXRAD is
now averaging 80Mbytes/hour, so you need more to handle a maximum hour. 
So, a queue size of 1.4Gbytes should accomodate both.  For this volume,
I'd say a gigabyte of RAM or more would be best.  Room to grow is always
good.  We have several sites succesfully running with nearly 2Gbyte
queues.

How big is your queue now?  And how much RAM do you have?  And, how old
is the oldest product in your queue?  (Use pqmon to determine the
latter.  The last field is the age of the oldest product in the queue in
seconds.)

Regarding your current configuration, Chiz said he could have redwood
feed elsewhere if that's better for you.  Or, could one of your local
sites relay to the other to reduce the load on flood?  Or, we could
reduce the NEXRAD feeds, although I'm guessing that's not making a big
impact on your machine, assuming the other NEXRAD sites are being
judicious like Purdue and UMich.

Anne
-- 
***************************************************
Anne Wilson                     UCAR Unidata Program            
address@hidden                 P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************