[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #UYH-624598]: LVS realserver switching loses data



Art,

> Must the "last product received" be a member of the feed stream being
> requested (e.g. CONDUIT)?  If so, how does the LDM "remember" the time of
> that product?  Does it check the products in the queue when it starts, or
> are there entries in the queue for each stream type indicating the last
> product received?

When starting from scratch, a downstream LDM checks the product-queue
for the most recent product that matches the product-class that the
downstream LDM will request.  Downstream LDM-s remember the last
successfully-received data-product.  If a downstream LDM is the only
one receiving a particular class of products, then it uses the
signature and the product-creation time (minus 60 seconds)
from the last successfully-received product when reconnecting.  If a
downstream LDM is one of many receiving a particular class of products,
then it searches back through the queue for the most recent matching
product and uses the same information from it when reconnecting.

I can imagine a scenario in which a gap could result from the near
simultaneous disconnection of two downstream LDM-s on the same computer
-- each receiving the same class of products but from different
upstream LDM-s.  Products in product-queues of the upstream LDM-s
would also have to be in different order.

> How would splitting the feed affect this?  For example,
> our ingest machine currently splits the feed request into two pieces for
> CONDUIT: "[02468]$" and "[^02468]$", but our realserver getting the data
> from the ingester requests CONDUIT as ".*".

Splitting a feed results in distinct product-classes and, hence,
independent downstream LDM-s.  So there would be no effect.

> Here's a few log file lines from a data loss instance today:
> 
> DOWNSTREAM REALSERVER MACHINE:
> Oct 26 15:59:02 iddrs3 idd-ingest.meteo.psu.edu[11336] NOTE: LDM-6 desired
> product-class: 20061026152852.421 TS_ENDT {{CONDUIT,  ".*"},{NONE,
> "SIG=d9d8d8a75a5c05b6556718c17f692a04"}}
> Oct 26 15:59:04 iddrs3 idd-ingest.meteo.psu.edu[11336] NOTE: LDM-6 desired
> product-class: 20061026152852.421 TS_ENDT {{CONDUIT,  ".*"},{NONE,
> "SIG=d9d8d8a75a5c05b6556718c17f692a04"}}
> Oct 26 15:59:06 iddrs3 idd-ingest.meteo.psu.edu[11336] NOTE: LDM-6 desired
> product-class: 20061026152852.421 TS_ENDT {{CONDUIT,  ".*"},{NONE,
> "SIG=d9d8d8a75a5c05b6556718c17f692a04"}}
> 
> UPSTREAM INGEST MACHINE:
> Oct 26 15:59:10 iddrs2 iddrs3.meteo.psu.edu(feed)[21975] NOTE: Starting
> Up(6.4.5/6): 20061026155901.886 TS_ENDT {{CONDUIT,  ".*"}}, Primary
> Oct 26 15:59:10 iddrs2 iddrs3.meteo.psu.edu(feed)[21975] NOTE: topo:
> iddrs3.meteo.psu.edu {{CONDUIT, (.*)}}
> 
> (Note 1:  iddrs2 in this case is actually the idd-ingest machine as I had
> to switch things around this morning after some hardware problems but the
> name didn't get updated.
> 
> Note 2:  idd-ingest (iddrs2) requests the CONDUIT data in a split feed as
> I describe above, whereas iddrs3 requests CONDUIT in one request as ".*")
> 
> As best as I can interpret these entries, it looks like the realserver
> (iddrs3) was requesting CONDUIT data with an age since 15:28:52 but the
> ingest server (idd-ingest a.k.a. iddrs2) responded with data with an age
> since 15:59:01 which also coincides with an ldm restart of the ingest
> machine.  Am I reading this right?  Can you provide any further insights
> on these log entries?  I should note that the iddrs3 system was not
> stopped/restarted during the above period, but was waiting for idd-ingest
> (iddrs2) to come back to provide a feed.

The last product received by the downstream LDM on Iddrs3 had a creation-time
of 20061026152852.421 and the given signature.  On Iddrs2, the signature
was associated with a product that was INSERTED into Iddrs2's product-
queue at 20061026155901.886.  Iddrs2's LDM started sending data-products
beginning with the product that was inserted just after that time.

There are two times involved in all this: one is the product-creation
time and the other is the time that a product is inserted into the
local product-queue.

> [Re: monitoring the age of the oldest product]
> Okay, I'll take a look at starting up a monitor...
> 
> 
> Thanks again for your help...
> 
> Art
> 
> Arthur A. Person
> Research Assistant, System Administrator
> Penn State Department of Meteorology
> email:  address@hidden, phone:  814-863-1563

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: UYH-624598
Department: Support LDM
Priority: Normal
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.