[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LDM - ldm process becomes defunct on server



Sarah,

>Date: Fri, 26 Aug 2005 18:07:12 -0600 (MDT)
>From: "sarah thompson" <address@hidden>
>Organization: NOAA/NWS/FSL
>To: address@hidden
>Subject: LDM - ldm process becomes defunct on server

The above message contained the following:

> Institution: noaa/fsl
> Package Version: 6.4.1
> Operating System: fedora core 4
> Hardware Information: dell poweredge 750
> Inquiry: Have an upstream "server" that is feeding data to 2
> downstream machines.  I'm running fedora core 4..which i think is what
> you all are running. i'm on kernel 2.6.12-1

> I compiled with gdb but when the upstream machine "dies" meaning
> all process' reporting as defunct, it didn't produce a core file.

On a Linux system, the following must be true for a normally-installed
LDM to dump a corefile:

    1.  "ulimit -c" must return a non-zero value (preferably "unlimited")

    2.  The file "/proc/sys/kernel/suid_dumpable" must contain a "2".

> the downstream machines have this message in their ldmd.log

> NOTICE: requester6.c:447; ldm_clnt.c:310: nullproc_6 failure to
> eldmf1.fsl.no aa.gov; ldm_clnt.c:145: RPC: Timed out

The above means that a downstream LDM-6 process sent a NULLPROC message
to an upstream LDM-6 but the reply from the upstream LDM-6 timed-out.
An obvious candidate is that the upstream LDM lacks an ALLOW entry for 
the downstream host in the upstream LDM's configuration-file.

What does the following command return when executed on the downstream
host?

    /usr/sbin/rpcinfo -n 388 -t eldmf1.fsl.no aa.gov 300029 6

This command bypasses the LDM on the downstream host, completely and
should indicate that the LDM on the upstream host is available, e.g.,

    $ /usr/sbin/rpcinfo -n 388 -t oliver.unidata.ucar.edu 300029 6
    program 300029 version 6 ready and waiting

If it fails, then is there anything corresponding to the connection
attempt in the logfile of the upstream LDM?
    
> I have been testing for weeks now on many os' and different kernels
> and eventually ldm always crashes with the above error.

What version of the LDM?

> I have attached the ldmd.log.

I didn't see any evidence of the LDM crashing in the logfile you sent.

> I have no idea what else to test.  Hope you have
> insights, as I'm all out of troubleshooting ideas.  Thanks.  Sarah
...

Regards,
Steve Emmerson


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.