[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20020822: ldmd won't stay running



Unidata Support wrote:
> 
> ------- Forwarded Message
> 
> >To: address@hidden
> >From: John C Nordlie <address@hidden>
> >Subject: ldmd won't stay running
> >Organization: UCAR/Unidata
> >Keywords: 200208222031.g7MKVkK27758
> 
> Starting about two days ago, our LDM ingest system won't keep ldmd
> running.  It started dying every couple hours, but now won't stay
> running for more than about 5 minutes at a time.  I've deleted the
> queue and remade it multiple times, rebooted the system, and run
> fsck on the hard drives.  Everything looks normal.  Here are a few
> lines from the log file when it dies:
> 
> Aug 22 10:21:03 adiabat /kernel: pid 211 (rpc.ldmd), uid 1000: exited on
> signal 6
> Aug 22 15:04:08 adiabat /kernel: pid 15261 (rpc.ldmd), uid 1000: exited on
> signal 6
> Aug 22 15:10:08 adiabat /kernel: pid 44951 (rpc.ldmd), uid 1000: exited on
> signal 6
> 
> I'm running LDM 5.1.4 under FreeBSD 4.6 stable, running on a Pentium 4
> 1.6GHz box with 256M of RAM and 1G of swap space.  There is plenty of
> swap and disk space left.  The load averages are not above 3.5 and
> usually less than 2.
> 
> I haven't many any changes to the system for over a month, yet the
> problem just started about two days ago.  Any ideas?
> 
> =========================================================================
> ==)----------                   |                           ----------(==
> John Nordlie   N0RNB            |     Regional Weather Information Center
> address@hidden            |              University of North Dakota
> 701-777-6112 / 701-777-3888 fax | PO Box 9007, Grand Forks, ND 58202-9007
> http://blizzard.rwic.und.edu/~nordlie/
> ==)----------      #include <std.disclaimer.h>              ----------(==
> =========================================================================
> 
> ------- End of Forwarded Message

Hi John,

Yuck!  I hate it when stuff just stops working!

On your FreeBSD system, please do 'kill -l'.  It should list all the
signals.  Then, counting starting from 1 (not 0), please let me know
what signal 6 is on your system.

Have you looked at the system logs?  Anything interesting there?

How many rpc.ldmds do you have running?  Is it always the same
connection that aborts?  Especially if it's only staying up for  5
minutes, try running in debug mode, e.g., from the command line, start
the ldm with 'rpc.ldmd -x' and please send me the results.

Just for the record, my last two support questions where things
mysteriously broke were due to network engineers installing software to
limit file sharing.  (In these cases it was Packetshaper - you probably
saw my email about that).  It is the time of year for them to do such
things.  However, your symptoms sound different.  Still, it might be
useful to ensure no changes were made at that end.  Can you ask someone
on your campus if anything has changed in networking?  Does ldmping work
for you for longer than 5 minutes?

Anne
-- 
***************************************************
Anne Wilson                     UCAR Unidata Program            
address@hidden                 P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************