[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #MAM-895427]: LDM stream not responding



Hi Bill,

re:
> Here is the results of the tests.. Injected into the text below.

OK.  It looks like the problem is the LDM on squall is wedged:

FAIL:
The process is perpetually waiting for the LDM to terminate

[ldm@squall:/usr/local/ldm]% ldmadmin restart

Flushing the LDM product-queue to disk...

Stopping the LDM server...

Waiting for the LDM to terminate
Waiting for the LDM to terminate
Waiting for the LDM to terminate
Waiting for the LDM to terminate
Waiting for the LDM to terminate
Waiting for the LDM to terminate
 ...

Given that Clint's machine (chinook.unl.edu) is still ingesting data
and reporting real-time stats, and because a 'notifyme' to Clint's
machine works as it should, you should forcibly stop your LDM.
I would do this by listing all of the processes running as your
'ldm' user (ps -u ldm) and killing the ones related to the LDM
itself ('ldmd's, 'pqact's, etc.).  You may be forced to use 'kill -9'
to kill the processes.  If the 'kill -9' does not work, you will
need to reboot your machine (I have seen this happen on very rare
occasions, and it typically has nothing to do with the LDM per se).

Since the LDM has not been ingesting for some time, it would be
wise to delete and remake your LDM queue:

-- stop all LDM processes any way that you can including possibly
   rebooting

ldmadmin delqueue
ldmadmin mkqueue
ldmadmin start

After restarting, data should start flowing again.

re:
> I have not contacted Clint yet but I am not able to
> terminate the LDM process through the ldmadmin restart command.

I don't think that calling Clint is warranted as the problem is
one local to squall.

re:
> Ldmadmin watch also shows nothing.

It wouldn't if everything else is wedged.

re:
> We also pass the notify test to nebraska so I am assuming that
> that means that the problem is not with Clint.

Correct.

re:
> Thanks for your guidance so far.

No worries.

In the future your first line of troubleshooting should be to review
the real-time stats pages for squall and any/all machines it is
REQUESTing data from.  Hopefully, these can more quickly indicate
where the problem lies so that you will not be down for so long!

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: MAM-895427
Department: Support IDD
Priority: Normal
Status: Closed