[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pq messages in log at UCSD



Russ Rew wrote:
> 
> Anne,
> 
> > Today I'm looking at the logs and I see some unusual messages:
> >
> > Jan 08 12:19:32 aeolus motherlode[744]: FEEDME(motherlode.ucar.edu): OK
> > Jan 08 12:19:33 aeolus motherlode[744]: pq_del_oldest: signature
> > 54c2cdcf6960020caf2ebdec373910cb: Not Found
> > Jan 08 12:19:33 aeolus motherlode[744]: hereis: pq_insert failed:
> > Invalid argument: b84f91c2b3adff27f7cd2f9943e2f18f     4432
> > 20020108121902.472 NNEXRAD 61479154  SDUS20 PHFO 081216 /pN2SHWA
> >
> > This occured twice in the past 5 hours.  He's running 5.1.4, which has
> > your changes regarding the pq_del_oldest conflict.   I guess these
> > messages originated from the rpc.ldmd that's receiving from motherlode,
> > but the messages must apply to the local queue.  And, here's what pqmon
> > is reporting:
> >
> > aeolus.ucsd.edu> pqmon
> > Jan 08 17:38:43 pqmon: Starting Up (2014)
> > Jan 08 17:38:43 pqmon: nprods nfree  nempty      nbytes  maxprods
> > maxfree  minempty    maxext  age
> > Jan 08 17:38:43 pqmon: 100275     1   82829   750001128    161069
> > 2     22035      1048 10105
> > Jan 08 17:38:43 pqmon: Exiting
> >
> > Otherwise things seem ok.  Do you have any ideas about what might have
> > occurred?
> 
> No, I'm not sure.  I haven't seen this before, but it might be a
> symptom of a corrupted product queue.  The "pq_insert failed:" message
> is just a consequence of pq_del_oldest failing.
> 
> Whenever a product is inserted in the queue, it's MD5 signature is
> inserted into a hash table for quickly checking on duplicate
> products.  Later when it's time to delete the product to make room for
> a new product, the signature must be deleted from the hash table.  In
> this case, the signature that was supposedly added to the hash table
> earlier is not found, so it can't be deleted.  This should never
> happen, so it indicates either a bug, a corrupted queue, or a disk or
> memory error.  Once there is one of these errors, there are likely to
> be more, if the hash table data structures are hosed.  It is dropping
> a product every time it encounters this problem.
> 
> Sounds like it might be time to restart the LDM with a new queue and
> see if it happens again.  I'd also be interested if this error message
> has ever occurred in LDM logs on motherlode or other machines ...
> 
> --Russ

Russ,

The problem only happened twice 7 hours ago  (and they were within 30
seconds of each other).  Otherwise things seem fine, as best I can
tell.  I think I'll just let it go for a while.

I've never seen these messages before.  I just scanned the logs on
motherlode and none have occurred in the past four days.  I'll let you
know if I see it again.

Anne