[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20040913: Possible pqact issue in LDM?



Steven,

>Date: Mon, 13 Sep 2004 15:02:10 -0500
>From: "Steven Danz" <address@hidden>
>Organization: Aviation Weather Center
>To: Steve Emmerson <address@hidden>
>Subject: Re: 20040909: Possible pqact issue in LDM?
>Keywords: 200409091803.i89I3pnJ023109

The above message contained the following:

> ntpd, and the products are only coming from the system itself (no
> request lines in ldmd.conf), so it shouldn't make a difference what
> the time is, true?

If a data-product is inserted just after the system clock is set
backwards, then the data-product could have an insertion-time that is
earlier than the cursor of a pqact(1) process and pqact(1) will miss it.

> Could something be inserted with the same timestamp as the check
> from pqact, but use a byte offset smaller (they are reused, true?)
> so that it would look 'old' from pqact's point of view?  Or maybe be
> inserted with a value equal to the pqact check?  Given the lack of a
> pattern here, it acts like some sort of corner case/race condition.  I
> noticed the the cursor check is greater than, not greater than equal,
> so I'm wondering if there isn't some case/condition that generates an
> insertion key that is not greater than the pqact cursor.

I think you might be onto something.  We're going to take a hard look at
the minor sort-key of the time map to see if that byte-offset component
could lead to problems.  If so, then quite a few users will owe you a
beer.

> Is there a debug switch that I could enable that would tell me more?
> I realized the output would be huge but maybe I could turn it on
> during those times I know I'm looking for a problem to appear.

I'm going to work-up a procedure to verify or disprove this misbehavior
hypothesis.  Would you be willing to test it on your system?

Regards,
Steve Emmerson