[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #UYH-624598]: LVS realserver switching loses data



Art,

What happens during a reconnect to a sole-source upstream LDM is this:
1) the downstream LDM sets the "from" time in the product-class that
will be in the request to the more recent of the creation-time of the
last product received or the current time minus the backoff (default
1 hour); 2) if a product was previously received, then the MD5
signature of that product is encoded into the product-class; 3) the
downstream LDM reconnects and the product-class is transmitted to the 
upstream LDM; 4) if a signature is encoded in the product-class and
the associated product exists in the product-queue, then the upstream
LDM starts sending product beginning just after the signature product;
otherwise, the "from" time of the product-class is used and products
are sent beginning with the first product that was inserted into
the product-queue just after that "from" time.

If the upstream LDM can't start from where it left off because products
have been purged from the queue, then a gap might result.

I note from the rtstats(1) web pages that the latency on Iddrs2 has been
as high as 33 minutes in recent days.  This is close to the 40 minutes
ago "from" time in the request that's associated with the gap.

I suggest that you use the pqmon(1) utility in a crontab(1) script to
periodically sample the age of the oldest product in the queue.  I do
that here and plot the time-series using gnuplot.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: UYH-624598
Department: Support LDM
Priority: Normal
Status: Closed