[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 19990827: PQING problem (fwd)




===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================

---------- Forwarded message ----------
Date: Fri, 27 Aug 1999 14:33:31 -0500
From: Chad Johnson <address@hidden>
To: Robb Kambic <address@hidden>
Subject: Re: 19990827: PQING problem

Robb Kambic wrote:
> 
> On Fri, 27 Aug 1999, Unidata Support wrote:
> 
> >
> > ------- Forwarded Message
> >
> > >To: address@hidden
> > >cc: address@hidden,
> > >cc: address@hidden
> > >From: Chad Johnson <address@hidden>
> > >Subject: PQING problem
> > >Organization: .
> > >Keywords: 199908271900.NAA02225
> >
> > Hi,
> >
> > Last week, we had a problem with the LDM running on our NOAAPORT ingestor 
> > on at
> > least two occasions. Unfortunately I was out of the office and couldn't do a
> > post mortem analysis, thus I didn't know what the exact problem was. Well, 
> > it
> > just happened again today and this is what I found.
> >
> > On the NOAAPORT SDI ingestor, I have one pqing process ingesting the text 
> > stream
> > and another pqing ingesting from the binary stream. The problem I 
> > discovered was
> > the pqing process reading the binary stream exitted on it's own. The message
> > from the ldmd log goes as such....
> >
> > Aug 27 18:13:53 dusk pqing[356]: Deleting oldest to make space 20352 bytes
> > Aug 27 18:13:53 dusk pqing[356]: Deleting oldest to make space 5104 bytes
> > Aug 27 18:13:53 dusk pqing[356]: del_oldest: conflict on 109373592
> > Aug 27 18:13:53 dusk pqing[356]: pq_insert: Resource temporarily unavailable
> > Aug 27 18:13:53 dusk pqing[356]: Exiting
> 
> Chad,
> 
> This is a conflict between the pqing process trying to delete a product
> out of the queue and to insert a product. The probably occurred because
> your queue size is not large enough, you should increase the size of the
> LDM queue by 20%.  Usually pqexpire deletes products out of the queue in a
> much more organized manner so this conflict is low.  If this continues
> after the queue size increase, let me know.

OK. The queue size was set to 120 MB. I've increased it to 144. I'll let you if
we have any further problems.

Thanks for the quick response.

-Chad

> 
> Robb...
> 
> > Aug 27 18:13:53 dusk pqing[356]:   Queue usage (bytes):120000512
> > Aug 27 18:13:53 dusk pqing[356]:            (nregions):   31762
> > Aug 27 18:13:53 dusk pqing[356]:   Duplicates rejected:   24431
> > Aug 27 18:13:53 dusk pqing[356]:   WMO Messages seen:    892186
> > Aug 27 18:13:53 dusk pqing[356]:   SOH/ETX missing  :         0
> > Aug 27 18:13:53 dusk pqing[356]:   parity/chksum err:         0
> > Aug 27 18:13:53 dusk pqing[356]:   WMO format errors:      1370
> > Aug 27 18:13:53 dusk pqing[356]:   FILE Bytes read:    3914748927
> >
> >
> > The messages that bother me are "del_oldest: conflict on 109373592" and
> > "Resource temporarily unavailable". Are these two messages related to the 
> > same
> > problem? What _is_ the problem? As you can see, this pqing process 
> > continued by
> > exiting. This causes problem when ingesting from the SDI, because for data 
> > to
> > continue flowing out the SDI, there must be a reader on both the binary and 
> > the
> > text streams. If one stops, the whole process blocks and data stops on both
> > streams.
> >
> > There were no messages in the /var/adm/messages file with regards to 
> > retries or
> > any other problems with the disk. There were the following messages with 
> > regards
> > to our network adaptor, but the times don't match up with the time on the
> > ldmd.log, and I've been told that our network adaptor has been doing this 
> > since
> > it was installed (nearly 2 years ago).
> >
> > Aug 27 17:00:32 dusk unix: NOTICE: pcn: transmitter shut down
> > Aug 27 17:00:32 dusk unix: NOTICE: pcn: attempting to restart tx
> > Aug 27 18:20:40 dusk unix: NOTICE: pcn: transmitter shut down
> > Aug 27 18:20:40 dusk unix: NOTICE: pcn: attempting to restart tx
> >
> > Any information you can provide would be of great help. This is ldm 5.0.8 on
> > Solaris X86
> >
> > Thanks
> >
> > -Chad
> >
> > --
> > Chad W. Johnson                           E-mail: address@hidden
> > Programmer/Meteorologist                  Voice: (608) 265-5292
> > Space Science and Engineering Center      Fax: (608) 263-6738
> > University of Wisconsin -- Madison
> >
> >
> > ------- End of Forwarded Message
> >
> 
> ===============================================================================
> Robb Kambic                                Unidata Program Center
> Software Engineer III                      Univ. Corp for Atmospheric Research
> address@hidden                   WWW: http://www.unidata.ucar.edu/
> ===============================================================================

-- 
Chad W. Johnson                           E-mail: address@hidden
Programmer/Meteorologist                  Voice: (608) 265-5292
Space Science and Engineering Center      Fax: (608) 263-6738
University of Wisconsin -- Madison