[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #UYH-624598]: LVS realserver switching loses data



Art,

> Must the "last product received" be a member of the feed stream being
> requested (e.g. CONDUIT)?  If so, how does the LDM "remember" the time of
> that product?  Does it check the products in the queue when it starts, or
> are there entries in the queue for each stream type indicating the last
> product received?

When starting from scratch, a downstream LDM checks the product-queue
for the most recent product that matches the product-class that the
downstream LDM will request.  Downstream LDM-s remember the last
successfully-received data-product.  If a downstream LDM is the only
one receiving a particular class of products, then it uses the
signature and the product-creation time (minus 60 seconds)
from the last successfully-received product when reconnecting.  If a
downstream LDM is one of many receiving a particular class of products,
then it searches back through the queue for the most recent matching
product and uses the same information from it when reconnecting.

I can imagine a scenario in which a gap could result from the near
simultaneous disconnection of two downstream LDM-s on the same computer
-- each receiving the same class of products but from different
upstream LDM-s.  Products in product-queues of the upstream LDM-s
would also have to be in different order.

> How would splitting the feed affect this?  For example,
> our ingest machine currently splits the feed request into two pieces for
> CONDUIT: "[02468]$" and "[^02468]$", but our realserver getting the data
> from the ingester requests CONDUIT as ".*".

Splitting a feed results in distinct product-classes and, hence,
independent downstream LDM-s.  So there would be no effect.

> Here's a few log file lines from a data loss instance today:
> 
> DOWNSTREAM REALSERVER MACHINE:
> Oct 26 15:59:02 iddrs3 idd-ingest.meteo.psu.edu[11336] NOTE: LDM-6 desired
> product-class: 20061026152852.421 TS_ENDT {{CONDUIT,  ".*"},{NONE,
> "SIG=d9d8d8a75a5c05b6556718c17f692a04"}}
> Oct 26 15:59:04 iddrs3 idd-ingest.meteo.psu.edu[11336] NOTE: LDM-6 desired
> product-class: 20061026152852.421 TS_ENDT {{CONDUIT,  ".*"},{NONE,
> "SIG=d9d8d8a75a5c05b6556718c17f692a04"}}
> Oct 26 15:59:06 iddrs3 idd-ingest.meteo.psu.edu[11336] NOTE: LDM-6 desired
> product-class: 20061026152852.421 TS_ENDT {{CONDUIT,  ".*"},{NONE,
> "SIG=d9d8d8a75a5c05b6556718c17f692a04"}}
> 
> UPSTREAM INGEST MACHINE:
> Oct 26 15:59:10 iddrs2 iddrs3.meteo.psu.edu(feed)[21975] NOTE: Starting
> Up(6.4.5/6): 20061026155901.886 TS_ENDT {{CONDUIT,  ".*"}}, Primary
> Oct 26 15:59:10 iddrs2 iddrs3.meteo.psu.edu(feed)[21975] NOTE: topo:
> iddrs3.meteo.psu.edu {{CONDUIT, (.*)}}
> 
> (Note 1:  iddrs2 in this case is actually the idd-ingest machine as I had
> to switch things around this morning after some hardware problems but the
> name didn't get updated.
> 
> Note 2:  idd-ingest (iddrs2) requests the CONDUIT data in a split feed as
> I describe above, whereas iddrs3 requests CONDUIT in one request as ".*")
> 
> As best as I can interpret these entries, it looks like the realserver
> (iddrs3) was requesting CONDUIT data with an age since 15:28:52 but the
> ingest server (idd-ingest a.k.a. iddrs2) responded with data with an age
> since 15:59:01 which also coincides with an ldm restart of the ingest
> machine.  Am I reading this right?  Can you provide any further insights
> on these log entries?  I should note that the iddrs3 system was not
> stopped/restarted during the above period, but was waiting for idd-ingest
> (iddrs2) to come back to provide a feed.

The last product received by the downstream LDM on Iddrs3 had a creation-time
of 20061026152852.421 and the given signature.  On Iddrs2, the signature
was associated with a product that was INSERTED into Iddrs2's product-
queue at 20061026155901.886.  Iddrs2's LDM started sending data-products
beginning with the product that was inserted just after that time.

There are two times involved in all this: one is the product-creation
time and the other is the time that a product is inserted into the
local product-queue.

> [Re: monitoring the age of the oldest product]
> Okay, I'll take a look at starting up a monitor...
> 
> 
> Thanks again for your help...
> 
> Art
> 
> Arthur A. Person
> Research Assistant, System Administrator
> Penn State Department of Meteorology
> email:  address@hidden, phone:  814-863-1563

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: UYH-624598
Department: Support LDM
Priority: Normal
Status: Closed