[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20020726: Help! LDM 5.2 keeps crashing on Linux RH 7.3!



>From: Gilbert Sebenste <address@hidden>
>Organization: NIU
>Keywords: 200207261442.g6QEgK915724 LDM 5.2 RedHat 7.3 Linux

Gilbert,

>I need help with my ldm 5.2 on weather2.admin.niu.edu. I'm running it on a 
>Dell Dimension P3, running RedHat Linux 7.3, all patches on (see 
>http://www.redhat.com/errata/rh73-errata.html. My LDM mysteriously dies, 
>without warning, with these messages.

If you havn't done so yet, I would try building 5.2 from source on
weather2.  In fact, when I run into LDM problems with a binary
installation this is always the first thing I do.

>Note: Weather.admin and 
>weather3.admin LDM's stay running fine (and both are on 
>5.2!), though weather2 is the machine where almost everything is ingested:
>
>Jul 26 10:37:16 weather2 pnga2area[2898]: unPNG::  1518538   2422256  1.5951 
>Jul 26 10:37:16 weather2 pnga2area[2898]: Exiting 
>Jul 26 10:39:53 weather2 rpc.ldmd[11879]: child 11883 terminated by signal 11 
>Jul 26 10:39:53 weather2 rpc.ldmd[11879]: Killing (SIGINT) process group 
>Jul 26 10:39:53 weather2 rpc.ldmd[11879]: Interrupt 
>Jul 26 10:39:53 weather2 rpc.ldmd[11879]: Exiting 
>Jul 26 10:39:53 weather2 pqbinstats[11880]: Interrupt 
>Jul 26 10:39:53 weather2 pqact[11882]: Interrupt 
>Jul 26 10:39:53 weather2 pqact[11882]: Exiting 
>Jul 26 10:39:53 weather2 weather-01[11884]: Interrupt 
>Jul 26 10:39:53 weather2 pqact[11885]: Interrupt 
>Jul 26 10:39:53 weather2 weather-02[11886]: Interrupt 
>Jul 26 10:39:53 weather2 weather-02[11886]: Exiting 
>Jul 26 10:39:53 weather2 weather-03[11887]: Interrupt 
>Jul 26 10:39:53 weather2 131.156.8.47[11888]: Interrupt 
>Jul 26 10:39:53 weather2 flood-3[11889]: Interrupt 
>Jul 26 10:39:53 weather2 flood-2[11890]: Interrupt 
>Jul 26 10:39:53 weather2 weather[11891]: Interrupt 
>Jul 26 10:39:53 weather2 weather-01[11884]: Exiting 
>Jul 26 10:39:53 weather2 131.156.8.47[11888]: Exiting 
>Jul 26 10:39:53 weather2 flood-3[11889]: Exiting 
>Jul 26 10:39:53 weather2 weather-03[11887]: Exiting 
>Jul 26 10:39:53 weather2 pqact[11885]: Exiting 
>Jul 26 10:39:53 weather2 whistler(feed)[12043]: Interrupt 
>Jul 26 10:39:54 weather2 rpc.ldmd[11879]: Terminating process group 
>Jul 26 10:39:54 weather2 pqbinstats[11880]: Exiting 
>
>What should I do besides utterly panic? :-) It won't stay up for more than 
>7 hours. Increasing queue size to 500 MB didn't help. Neither did a Glibc 
>patch that came out this week. Oh, and when I just type in "ldmadmin 
>start"...it restarts fine, no queue corruption (ldmadmin queuecheck came 
>back empty).

Other than building from source, I would try to put the LDM into
verbose logging at the 6 hour mark.  Perhaps the verbose logging will
shed some more light on the group leader rpc.ldmd exit.  You change
the LDM logging verbosity by sending the group leader rpc.ldmd
a USR1 signal.  The first 'kill -USR1' ups the logging level to
verbose; the second to debug; the third goes back to silent.

>Thanks for any help!

Tom