[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #PAU-308840]: ldm exiting



Hi Heather,

re:
> Tom, thank you very much for returning my email.

No worries.

re:
> I went in stopped the ldm, deleted and made a new queue.  I
> will make sure to do this next time something happens with the queue.

Very good.

re:
> Is this at all preventable?

It shouldn't have happened in the first place.  We have been ingesting
NOAAport on a number of machines for a LONG time (several years), and
we only have experienced this problem once or twice (and never on some
machines).

What caused your NOAAPort ingest process to seg fault is a mystery;
perhaps you got a slug of bad data in the broadcast that the
process simply couldn't handle?  Again, this is a rare occurrence.


> Thanks!
> 
> Heather Kiley
> ________________________________________
> From: Unidata LDM Support address@hidden
> Sent: Monday, November 12, 2012 10:26 AM
> To: Kiley, Heather L (IS)
> Cc: address@hidden
> Subject: EXT :[LDM #PAU-308840]: ldm exiting
> 
> Hi Heather,
> 
> re:
> > The ldm stopped unexpectedly on my noaap ingestor yesterday.  When I
> > tried to restart it using "ldmadmin start" I got this message:
> >
> > The writer-counter of the product-queue isn't zero.  Either a process
> > has the product-queue open for writing or the queue might be corrupt.
> > Terminate the process and recheck or use
> >
> > pqcat -l- -s -q /usr/local/ldm/var/queues/ldm.pq && pqcheck -F -q
> > /usr/local/ldm/var/queues/ldm.pq
> >
> > to validate the queue and set the writer-counter to zero.
> > LDM not started
> 
> This indicates that the LDM queue got damaged somehow.  The suggested
> action to take is, in fact, one of two alternatives.  The second
> alternative is the best one for a NOAAPort ingest machine:
> delete and remake the LDM queue:
> 
> <as 'ldm' on the machine having problems>
> ldmadmin stop
> ldmadmin delqueue
> ldmadmin mkqueue
> ldmadmin start
> 
> re:
> > I rebooted my machine in an attempt to clean up the queue, but I got
> > the same message again when I tried to restart the ldm.
> 
> Once the queue is damaged, reboots will have no effect; it will stay
> damaged until fixed or remade.
> 
> re:
> > I issued the command given in the error message:
> >
> > pqcat -l- -s -q /usr/local/ldm/var/queues/ldm.pq && pqcheck -F -q 
> > /usr/local/ldm/var/queues/ldm.pq
> >
> > And then I was able to restart the ldm.
> 
> OK.  For future reference: on NOAAPort ingest machines, I would simply
> delete and remake the queue as per the info I included above.  It is
> simpler, probably quicker and more foolproof.
> 
> re:
> > Do you have any idea what may
> > have happened to cause the ldm to stop?
> >
> > Here is the error message in my log before the ldm stopped:
> > Nov 11 08:09:37 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> > Nov 11 08:09:37 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> > Nov 11 08:09:44 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> > Nov 11 08:09:44 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> > Nov 11 08:10:05 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> > Nov 11 08:10:05 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> > Nov 11 08:10:30 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> > Nov 11 08:10:31 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> > Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: child 3284 terminated by signal 
> > 11: noaaportIngester -m 224.0.1.3
> > Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: Killing (SIGTERM) process group
> > Nov 11 16:28:35 noaapnew noaapxcd(feed)[3298] NOTE: Exiting
> > Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: Exiting
> > Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: Terminating process group
> 
> 'signal 11' indicates a segmentation violation.  Why this happened is
> not readily apparent.
> 
> re:
> > I would appreciate any advice.
> 
> I think that the expedient thing to do is/was delete and remake the LDM
> queue.
> 
> Cheers,
> 
> Tom
> --
> ****************************************************************************
> Unidata User Support                                    UCAR Unidata Program
> (303) 497-8642                                                 P.O. Box 3000
> address@hidden                                   Boulder, CO 80307
> ----------------------------------------------------------------------------
> Unidata HomePage                       http://www.unidata.ucar.edu
> ****************************************************************************
> 
> 
> Ticket Details
> ===================
> Ticket ID: PAU-308840
> Department: Support LDM
> Priority: Normal
> Status: Closed
> 
> 
> 

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: PAU-308840
Department: Support LDM
Priority: Normal
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.