[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #JIG-686458]: ldm 6.4.7.1 restart problem



Art,

> I've been having occasional problems with our LDM (6.4.7.1) "resetting"
> and instead of resuming data collection at the time it left off, it goes
> back as far as the "-m" setting in the upstream queue.  Below is an
> example from yesterday with iddrs3 being the downstream node and
> idd-ingest being the upstream:
> 
> Jan 14 16:29:24 idd-ingest iddrs3.meteo.psu.edu(feed)[19268] ERROR:
> Couldn't flush connection; nullproc_6() failure to iddrs3.meteo.psu.edu:
> RPC: Timed out
> Jan 14 16:29:46 idd-ingest iddrs3.meteo.psu.edu[19885] WARN:
> findTimeEntryWithOffset(): Target data-product with given metadata not
> found in time-map near its creation-time (20070114162655.320)

Data-products are indexed in the product-queue according to their
insertion-time and data-products have their creation-time as part
of their metadata.  As a consequence, it's very important that the
clocks on the various LDM machines be correct.  For example, if the
data-product that host iddrs3 last received from host idd-ingest
had a creation-time that was later than the time (according to the
system clock on idd-ingest) when it was inserted into idd-ingest's
product-queue, then the LDM on idd-ingest will be unable to find
that last, successfully-transmitted data-product.

Also, is the product-queue on idd-ingest large enough?  What's the
mean age of the oldest data-product?  What the minimum age of the
oldest product?  (Use pqmon(1) to discover this.)

> Jan 14 16:29:46 idd-ingest iddrs3.meteo.psu.edu[19885] NOTE: Data-product
> with signature e364b6103d1b4037e788f32c5516c86b wasn't found in
> product-queue
> Jan 14 16:29:46 idd-ingest iddrs3.meteo.psu.edu(feed)[19885] NOTE:
> Starting Up(6.4.7.1/6): 20070114102944.933 TS_ENDT {{ANY,  ".*"}},
> SIG=e364b6103d1b4037e788f32c5516c86b, Primary
> Jan 14 16:29:46 idd-ingest iddrs3.meteo.psu.edu(feed)[19885] NOTE: topo:
> iddrs3.meteo.psu.edu {{ANY, (.*)}}
> 
> 
> Jan 14 16:29:34 iddrs3 idd-ingest.meteo.psu.edu[3470] ERROR: readtcp():
> EOF on socket 4
> Jan 14 16:29:44 iddrs3 idd-ingest.meteo.psu.edu[3470] ERROR:
> one_svc_run(): RPC layer closed connection
> Jan 14 16:29:44 iddrs3 idd-ingest.meteo.psu.edu[3470] ERROR: Disconnecting
> due to LDM failure; Connection to upstream LDM closed
> Jan 14 16:29:44 iddrs3 idd-ingest.meteo.psu.edu[3470] NOTE: LDM-6 desired
> product-class: 20070114102944.933 TS_ENDT {{ANY,  ".*"},{NONE,
> "SIG=e364b6103d1b4037e788f32c5516c86b"}}

The LDM on iddrs3 is asking for data that was created about 4 hours ago.
Is the maximum acceptable latency on iddrs3 really 4 hours?

> Jan 14 16:29:45 iddrs3 idd-ingest.meteo.psu.edu[3470] NOTE: Upstream LDM-6
> on idd-ingest.meteo.psu.edu is willing to be a primary feeder
> 
> Any ideas on what might be causing this?
> 
> Thanks.
> 
> Art
> 
> Arthur A. Person
> Research Assistant, System Administrator
> Penn State Department of Meteorology
> email:  address@hidden, phone:  814-863-1563

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: JIG-686458
Department: Support LDM
Priority: Normal
Status: On Hold


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.