[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20040820: pqbinstats(1) receiving SIGSEGV



Hi Karen,

>Date: Fri, 20 Aug 2004 08:27:51 -0600
>From: Unidata Support <address@hidden>
>Organization: NOAA/NWS/NSSL
>To: address@hidden
>Subject: 20040820: pqbinstats

The above message contained the following:

> I recently started using pqbinstats to look at some latencies, and I 
> seem to be having a problem with it.  Last night my ldm appears to have 
> stopped, almost as if it had received a ldmadmin stop command.  I have 
> two core dumps, one from pqbinstats, and one from pqact and I have also 
> attached the log.   The log shows that pqbinstats got a signal 11 (seg 
> fault) and everything after that appears to have gotten a SIGINT.

On what kind of system did this occur?

We've seen pqbinstats(1) receive a SIGSEGV (which is usually signal 11)
on a SunOS 5.9 system at least once.  It is very rare, however.

If you wish, you can wrap programs that are started by EXEC entries in
the LDM configuration-file with a simple script that ensures that they
are restarted if they terminate.  Here's an example of such a wrapper:

    while true; do
        logger -p local0.notice \
            "Starting \"$*\" at `date -u '+%Y-%m-%d %H:%M:%S Z'`"
        "$@"
        logger -p local0.notice \
            "\"$*\" terminated with status %? at `date -u '+%Y-%m-%d %H:%M:%S 
Z'`"
    done

> I've turned pqbinstats off, since I don't need it at this time, but I 
> thought you would like to have a look at these outputs.

Thanks, but the core files won't due me much good unless the programs
were built using the "-g" compiler option and I have the same system.

Regards,
Steve Emmerson