[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #UYH-624598]: LVS realserver switching loses data



Art,

> Because our weather guys who use the data and monitor it pretty closely
> reported the loss of data for a specific interval of time that coincides
> closely with the transitions noted above.  They noted specific ensemble
> forecast hours that had missing data and noted the times on the files when
> they were last written to.  We also experienced data loss again today...
> this time some sort of network aberation on one of our realservers cause
> the connections to stop/restart on that machine and we noted data loss
> again very close to when it occurred.  I think the fact that our CONDUIT
> feed currently runs behind by up to 1200 seconds makes whatever the
> problem is more noticeable.  We're not running a testbed, so I can't say
> with certainty that it's cause-and-effect, but the circumstantial evidence
> to us is pretty convincing.

It could be that the pqact(1) process that's responsible for processing the 
data-products of interest is running behind when the LDM system is restarted.  
By default, pqact(1) starts processing data-products beginning at the youngest 
end of the product-queue.  If it can't keep up with the rate of data-product 
arrival, then a restart will cause it to skip the recently-arrived 
data-products between its current position and the youngest end of the 
product-queue.

I'm thinking about how the LDM might be modified to eliminate this behavior.  
In the meantime, you might try using the "-o <offset>" option of the pqact(1) 
program to start processing before the youngest end of the product-queue.

> 
> Art
> 
> Arthur A. Person
> Research Assistant, System Administrator
> Penn State Department of Meteorology
> email:  address@hidden, phone:  814-863-1563
> 
> 

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: UYH-624598
Department: Support LDM
Priority: Normal
Status: Closed