[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #SAE-848662]: Fwd: LDM dies after couple days, can't restart



corepuncher,

SIGINT is the name of the interrupt signal. It can be sent to a process using 
the command "kill -s INT <pid>", where "<pid>" would be the process ID of the 
top-level LDM server.

The top(1) utility lists running processes sorted by CPU usage. Because the LDM 
processes use very little CPU, a better way of determining if they're running 
is to use the ps(1) utility. For example, "ps -ef | grep noaaportIngester".

While logging onto the system in question during an episode would be useful, I 
might be able to spot problems by just general browsing.

> *Hi Steve.  See my replies in bold.*
> 
> address@hidden> wrote:
> 
> > The best ways to determine if data is flowing are "ldmadmin watch" and
> > "notifyme -vl-".
> 
> *Ok, I will try that next time it happens.*
> 
> > It can take a while to stop an LDM system. If it doesn't stop withing a
> > minute, however, then something's wrong.
> 
> *It was over 5 minutes.*
> 
> > A SIGINT sent to the top-level LDM server should stop the system quickly
> > -- at the risk of corrupting the product-queue.
> 
> *Could you elaborate...do I just type "SIGINT" in the terminal while logged
> in as ldm? Does this command effect other things?*
> 
> > I suspect that you still have noaaportIngester(1) processes running.
> 
> *If it is running, then it is not showing up in "top". *
> 
> > Would it be possible for me to log onto the system in question as the LDM
> > user?
> 
> *Perhaps...I'd have to figure out a way to setup that "join me" software.
> Would you have to do it right when it's acting up, or, would there be any
> benefit to looking around at our config when it's running well?*

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: SAE-848662
Department: Support LDM
Priority: Normal
Status: Closed