[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LDM - ldm process becomes defunct on server



Sarah,

>Date: Fri, 26 Aug 2005 18:07:12 -0600 (MDT)
>From: "sarah thompson" <address@hidden>
>Organization: NOAA/NWS/FSL
>To: address@hidden
>Subject: LDM - ldm process becomes defunct on server

The above message contained the following:

> Institution: noaa/fsl
> Package Version: 6.4.1
> Operating System: fedora core 4
> Hardware Information: dell poweredge 750
> Inquiry: Have an upstream "server" that is feeding data to 2
> downstream machines.  I'm running fedora core 4..which i think is what
> you all are running. i'm on kernel 2.6.12-1

> I compiled with gdb but when the upstream machine "dies" meaning
> all process' reporting as defunct, it didn't produce a core file.

On a Linux system, the following must be true for a normally-installed
LDM to dump a corefile:

    1.  "ulimit -c" must return a non-zero value (preferably "unlimited")

    2.  The file "/proc/sys/kernel/suid_dumpable" must contain a "2".

> the downstream machines have this message in their ldmd.log

> NOTICE: requester6.c:447; ldm_clnt.c:310: nullproc_6 failure to
> eldmf1.fsl.no aa.gov; ldm_clnt.c:145: RPC: Timed out

The above means that a downstream LDM-6 process sent a NULLPROC message
to an upstream LDM-6 but the reply from the upstream LDM-6 timed-out.
An obvious candidate is that the upstream LDM lacks an ALLOW entry for 
the downstream host in the upstream LDM's configuration-file.

What does the following command return when executed on the downstream
host?

    /usr/sbin/rpcinfo -n 388 -t eldmf1.fsl.no aa.gov 300029 6

This command bypasses the LDM on the downstream host, completely and
should indicate that the LDM on the upstream host is available, e.g.,

    $ /usr/sbin/rpcinfo -n 388 -t oliver.unidata.ucar.edu 300029 6
    program 300029 version 6 ready and waiting

If it fails, then is there anything corresponding to the connection
attempt in the logfile of the upstream LDM?
    
> I have been testing for weeks now on many os' and different kernels
> and eventually ldm always crashes with the above error.

What version of the LDM?

> I have attached the ldmd.log.

I didn't see any evidence of the LDM crashing in the logfile you sent.

> I have no idea what else to test.  Hope you have
> insights, as I'm all out of troubleshooting ideas.  Thanks.  Sarah
...

Regards,
Steve Emmerson