[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20040917: Possible pqact issue in LDM?



Steven,

>Date: Thu, 16 Sep 2004 20:53:48 -0500
>From: "Steven Danz" <address@hidden>
>Organization: Aviation Weather Center
>To: Steve Emmerson <address@hidden>
>Subject: Re: 20040913: Possible pqact issue in LDM?
> Keywords: 200409142045.i8EKjBnJ016694 LDM-6 ldmadmin pqact.conf

The above message contained the following:

> So far so good, but I'm wanting to give it some time.  :-)

Same here.  About how often, however, has your pqact(1) missed
data-products?

> I've been mulling it around and had a few questions.
> 
> 1)  Noticed the changes in pqact and was wondering if the rpc.ldmd uses
>     the same or a similar method to determine which products go to
>     downstream clients.

The changes to the pqact(1) program should not be relevant to the
problem you're seeing: they addressed other issues.

>     If so, could it be possible that this might
>     cause a server to not pass products down to a client?  I've had a
>     problem once in a blue moon where a product upstream doesn't show
>     up down, but I've always seemed to convince myself it was 'something
>     else'.

Both pqact(1) and sending rpc.ldmd(1) processes use the pq_sequence(3)
function of the pq(3) module.  This function is responsible for
sequencing through all desired data-products in the product-queue. I
hope that changes to this module will stop pqact(1) from skipping
data-products.  This should also eliminate any similar behavior by
sending rpc.ldmd(1) processes.

> 2)  The timestamp that is assigned when the product is placed in the
>     queue, is it preserved as the product passes from server to client?

No.  What goes along with the data-product as part of its metadata
is the creation-time of the data-product.  The insertion-time of a
data-product into a product-queue is local to the system only and is not
communicated between LDM-s.

>     If so, if there were 4 data sources feeding one client, then
>     there exists a chance (regardless of the speed of the systems)
>     that a duplicate timestamp could be created by the data sources.

It does, indeed, seem possible that, on a fast system, more than one
data-product could have the same insertion-time -- especiall during a
reconnection when the downstream LDM is "catching up".

> 3)  Even if the timestamp is reassigned by the client, if the client
>     had multiple upstream sources, and therefore multiple rpc.ldmd
>     processes inserting products in the queue, it would seem (especially
>     on multiple CPU systems) that the multiple client rpc.ldmd processes
>     would stand a pretty good chance of creating duplicate timestamps.

Warren Blanchard at the national NWS HQ has reported missing about 8
NEXRAD Level II data-products per 100,000 data-products (a 0.008% loss
rate).  This seems similar to your pqact(1) problem.

>     True, for a single-CPU system it would take a pretty fast system
>     (and pretty small products), but I would think that with an n-way
>     system it would more likely.

I agree.  Hopefully, the new pq(3) module will fix this.

> Like I said, just some random thoughts.  Thanks for the work on this!
> 
> Steven
>
> Steven Danz
> Senior Software Development Engineer
> Aviation Weather Center (NOAA/NWS/NCEP)
> 7220 NW 101st Terrace, Room 101
> Kansas City, MO 64153-2371
> 
> Email: address@hidden
> Phone: 816.584.7251
> Fax:   816.880.0650
> URL:   http://aviationweather.gov/
> 
> The opinions expressed in this message do not necessarily reflect those 
> of the National Weather Service, or the Aviation Weather Center.

Regards,
Steve Emmerson