[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030702: LDM - IRIX64 - Signal 11, ldm killed?!?



Hi Christian,

>Date: Tue, 1 Jul 2003 22:15:37 -0400
>From: =?ISO-8859-1?Q?Christian_Pag=E9?= <address@hidden>
>Organization: UQAM
>To: Steve Emmerson <address@hidden>
>Subject: Re: 20030701: LDM - IRIX64 - Signal 11, ldm killed?!? 

The above message contained the following:

> Sorry, I just got this:
> 
> ldmd.log.2:Jun 30 16:17:31 5Q:io rpc.ldmd[121683381]: child 121702674  
> terminated by signal 11

The process must have been started so long ago that all log files with
any entries (other than the above) have been deleted.

You might consider keeping more log files around by changing the
$numlogs variable in the script "bin/ldmadmin" -- at least until we
solve this problem.

> But I do have the core! pqbinstats is the responsible...
> 
> 90 [/synop/ldm/logs] % dbx ~ldm/bin/pqbinstats
> dbx version 7.3 MR 55458_Apr30_MR Apr 30 1999 13:44:41
> Core from signal SIGSEGV: Segmentation violation
> (dbx) t
>  >  0 syncbinstats(0x8a, 0x0, 0x0, 0x0, 0x0, 0xfffffc96, 0x10066ffa,  
> 0x0) ["/io/ldm/ldm-6.0.13/src/pqbinstats/binstats.c":538, 0x10005a58]
>     1 main(0x10003da0, 0xffffffff, 0x1, 0x7fffffff, 0xf423f, 0x7fff2e08,  
> 0xffffffff, 0x1002aad8)  
> ["/io/ldm/ldm-6.0.13/src/pqbinstats/pqbinstats.c":410, 0x10004790]
>     2 __start()  
> ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M4/csu/ 
> crt1text.s":177, 0x10003d58]
> (dbx) quit

That's odd.  Line 538 of file "src/pqbinstats/binstats.c" is a call to
the rewind(3) function.  This function shouldn't cause a segmentation
violation unless its argument is invalid -- and there's a check for a
NULL argument just before the call.  I wonder what's happening.

Thanks for this information.

Regards,
Steve Emmerson