[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20040920: Possible pqact issue in LDM?



Steven,

>Date: Mon, 20 Sep 2004 10:12:07 -0500
>From: "Steven Danz" <address@hidden>
>Organization: Aviation Weather Center
>To: Steve Emmerson <address@hidden>
>Subject: Re: 20040918: Possible pqact issue in LDM?
>Keywords: 200409091803.i89I3pnJ023109

The above message contained the following:

> Sure... the story goes something like this. 
> 
> AWC has a NorthupGrumman NOAAPort receiver system, which is pretty
> much just a stripped down AWIPS CP.  On this system, we have some
> software from FSL that can talk to the AWIPS CP software and for each
> product received on the NOAAPort, insert it into the LDM queue.  So,
> we also have LDM running on this system, configured as a pure data
> source (no 'request' lines in ldmd.conf) to feed the NOAAPort data to
> other systems in the center.  Now, to make a record of the time that
> each product reaches the center on NOAAPort, the LDM on the receiver
> has a small pqact.conf that, for each AWC product, EXEC's a script to
> put a one-line product in the queue that contains the current wall
> clock time, the server name, product name, etc. to give us a record of
> the time that the product arrived from NOAAPort.
> 
> Now, down stream from the NOAAPort receiver, there is an LDM client
> with a pqact configured that stores all these 'receive notification'
> in to a file by product, by day.  We also keep a similar log of every
> transmit of every product from the center.  Then, we have a script
> that takes the send log entries and matches them up with the receive
> log entries to determine delay and to monitor if the NWSTG drops a
> product When ever there is a missing receive entry that is 'too old',
> an alarm goes up on our monitoring software (Nagios is the package we
> are using). So, when there is an alarm on Nagios (and I catch it in
> time before things are flushed from the queue) I quickly log into the
> NOAAPort receiver to check
> 1) is the product in the queue
> 2) is the receive notice in the queue
> 3) is there a log entry from the receive notice script that it attempted 
> to put a notice in the queue
> 4) and when I was running pqact -v, was there an entry that pqact saw 
> the product go by
> 
> So far, each time there has been a problem reported 1) has been fine,
> the product was in the queue, but 2) was not and there was no entry
> in 3) indicating that the script had attempted to run.  When I was
> running 'pqact -v' over the weekend I noticed that there were 'chunks'
> of headers missing when comparing the list of headers to what 'pqcat'
> displayed in the queue.  For example, looking over about 40 minutes
> of the queue, there were about 255 products in 13 'chunks' that pqcat
> listed in the queue, that the 'pqact -v' didn't report seeing.
> 
> Probably too much detail :-)

Not at all.

Are you checking the product-queue too soon after being notified?  Is
the missed data-product later acted-upon by pqact(1), indicating that it
was merely delayed?

Do you have a saved product-queue that pqcat(1) indicates contains
data-products that pqact(1) missed?

If so, if you manually execute pqact(1) on this product-queue, does it
find the "missed" data-products, e.g.,

    echo '<<feedtype>>  (<<pattern>>)   EXEC    -wait   echo \1' >conf
    pqact -vl- -o <<time>> -q <<pq>> conf

where
    <<feedtype>>        Is the feedtype of a data-product that pqact(1)
                        missed.
    <<pattern>>         Is the pattern of a data-product that pqact(1)
                        missed.
    <<time>>            Is the age of the oldest data-product in the
                        product-queue in seconds (use pqmon(1) to
                        determine this).
    <<pq>>              Is the pathname of the saved product-queue.

Are there non-printing characters in the product-identifier of the
"missed" data products that cause them to not be matched?  You can check
the product-identifiers with

    pqcat -vl- -f <<feedtype>> -p <<pattern>> -q <<pq>> -i 0 | od -c

Regards,
Steve Emmerson