[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #UYH-624598]: LVS realserver switching loses data



Art,

> We observed the following situation this afternoon:
> 
> o A realserver in our LVS cluster was poky and dropped out of our
> LDM cluster service for ~ 20 seconds
> 
> o Our in-house downstream LDM saw the connection loss, dropped the
> connection and tried to reconnect:
> 
> Oct 23 17:09:43 ls2 ldm.meteo.psu.edu[7494] ERROR: Terminating due to LDM 
> failure; Connection to upstream LDM closed
> Oct 23 17:09:43 ls2 ldm.meteo.psu.edu[7494] NOTE: LDM-6 desired 
> product-class: 20061023170731.008 TS_ENDT {{CONDUIT,  ".*"},{NONE,  
> "SIG=e2c0f8838b2642a11c8b7943174ae825"}}
> Oct 23 17:09:43 ls2 ldm.meteo.psu.edu[7494] NOTE: Upstream LDM-6 on 
> ldm.meteo.psu.edu is willing to be a primary feeder
> 
> It succeeded, and actually reconnected to the original realserver it
> was pulling data from which had apparently come back online by then:
> 
> Oct 23 17:09:43 iddrs2 ls2.meteo.psu.edu(feed)[8854] NOTE: feed or notify 
> failure; HEREIS: RPC: Unable to send; errno = Broken pipe
> Oct 23 17:09:43 iddrs2 rpc.ldmd[12779] NOTE: child 8854 exited with status 7
> Oct 23 17:09:44 iddrs2 ls2.meteo.psu.edu(feed)[9486] NOTE: Starting 
> Up(6.4.5/6): 20061023170904.844 TS_ENDT {{CONDUIT,  ".*"}}, Primary
> Oct 23 17:09:44 iddrs2 ls2.meteo.psu.edu(feed)[9486] NOTE: topo:  
> ls2.meteo.psu.edu {{CONDUIT, (.*)}}
> 
> o We observed the apparent loss of a number of ensemble fields coming in
> during the time of this transition.

Why do you say "We observed the apparent loss of a number of ensemble fields 
coming in during the time of this transition"?  What were the symptoms?  How 
did you judge that the fields were missing?

> Does this seem possible, and if so, is there any way to prevent it?  I'm
> running all my ldm instances now with "-m 21600 -o 21599" hoping this
> would eliminate any possibility of data loss on LDM restarts or LVS
> cluster switchovers, but something seems to be going wrong.  Any
> suggestions?
> 
> Thanks.
> 
> Art
> 
> Arthur A. Person
> Research Assistant, System Administrator
> Penn State Department of Meteorology
> email:  address@hidden, phone:  814-863-1563

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: UYH-624598
Department: Support LDM
Priority: Normal
Status: On Hold