Steven, >Date: Mon, 20 Sep 2004 14:35:13 -0500 >From: "Steven Danz" <address@hidden> >Organization: Aviation Weather Center >To: Steve Emmerson <address@hidden> >Subject: Re: 20040920: Possible pqact issue in LDM? >Keywords: 200409091803.i89I3pnJ023109 The above message contained the following: > ... So far a missed > product doesn't show up in the log when running in real-time. Running > it by hand after the fact and it shows up fine. The first thing that pqact(1) does with every product is to log it to the logfile (when in verbose mode). This is done before executing any action. Consequently, it's very hard to believe that the manually-executed pqact(1) does this but the LDM pqact(1) doesn't because this would mean that the two pqact(1)s aren't matching the same data-products. A more believable hypothesis is that the data-product in question hasn't yet arrived in the product-queue (but see below). > So you are saying that it takes over 600 seconds (10 minutes) on a > system with a load of ~0.05 to act on a product once it is received in > the queue? No. I'm saying that it can take more than ten minutes from the time that the AWC transmits a data-product destined for NOAAPORT to the time that it's received by the AWC's LDM from NOAAPORT. > I thought the SIGCONT that the pq_insert() generates would 'kick' the > pqact into action alot sooner than that... It will. > Also, the 'missed' product is bracketed by 'caught' products in every > case so far. Which was another cause for concern. So I would guess > it should be taking care of these things in order, so it should have > caught the missed product. Well, that kills my "more believable hypothesis" scenario, above. I hope you're sure about this. Are there any reconnections by the monitoring LDM system that's downstream from the NorthupGrumman NOAAPort system at this time? Also, as I explained in an email on Mon, 13 Sep 2004 15:44:34 -0600, a data-product that was inserted into the product-queue just after the system clock was set backwards could be missed by a reader of the product-queue. The behavior of the LDM system in this case would be consistent with everything that you've related. Is the clock on the NorthupGrumman NOAAPort system (on which the LDM is running) kept accurate somehow? Is an ntpd(8) daemon running? Does root's crontab(1) execute ntpdate(1) periodically? > I'm wondering if there isn't something 'bad' happening because > of how the ingesters were written. I find it odd that they > pq_open()/pq_close() for each product inserted, not just once for the > duration of the program. These LDM ingesters are getting data-products from the AWIPS CP software? Are they started by EXEC entries in the LDM configuration-file? I can't see how opening and closing the product-queue for every data-product would cause the problem you're seeing, but that mechanism is grossly inefficient -- especially for large, memory-mapped files -- and should be corrected ASAP. > The way the pqact is behaving, I'm wondering about signals and if > something isn't causing pqact to 'jump' away from its normal routine > and miss a section of the queue. I don't see how. Waiting with bated breath, Steve Emmerson
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.