[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #JIG-686458]: ldm 6.4.7.1 restart problem



Art,

> Both systems are running ntp.  I just checked them, and they both have
> correctly set times.  Unless ntp allows the time to run out a large
> portion of a second, I don't think this is going to be the problem.
> 
> I don't have pqmon running on the system on a regular basis... maybe I
> will do that.  However, I've checked this before and it hasn't been an
> issue.  The queue size on both systems is 8 GB and the latencies coming
> into idd-ingest (for CONDUIT, our slowest feed) max's out around 2000
> seconds.  I believe the queue holds much more than 2000 seconds of data.

The reason I asked about the age of the oldest product in the queue is
because the LDM on iddrs3 asked for data-products beginning just after
the last received one:

> Jan 14 16:29:44 iddrs3 idd-ingest.meteo.psu.edu[3470] NOTE: LDM-6 
> desired product-class: 20070114102944.933 TS_ENDT {{ANY, ".*"},{NONE,
> "SIG=e364b6103d1b4037e788f32c5516c86b"}}

(where the "SIG=..." encodes the MD5 checksum of the last, received
product) but that product wasn't found in the product-queue of 
idd-ingest's LDM:

> Jan 14 16:29:46 idd-ingest iddrs3.meteo.psu.edu[19885] NOTE: Data-product
> with signature e364b6103d1b4037e788f32c5516c86b wasn't found in
> product-queue

As a consequence, idd-ingest's LDM had no option but to start sending
data-products based on the "from" time (six hours ago) in iddrs3's
request (the fallback position).

If the system was operating correctly, then the "signature" data-product
would have been found in idd-ingest's product-queue.  The fact that
it wasn't indicates a problem.  If the product was removed in order
to free-up space, then it wouldn't be found.  Hence my asking about
the age of the oldest product.

> I don't recall seeing this kind of behaviour until recently.  Is it
> possible that something you fixed in the 6.4.7.1 version could be related
> to this behaviour?

Beta version 6.4.7.1 changes the way the "from" time is set.  Before, the
"from" time would be set based on the creation-time of the last,
successfully-received data-product.  Now, the "from" time is set based on
the "offset" parameter (the "-o" option") or the maximum acceptable
latency (the "-m" parameter) if the offset isn't specified.  This  
eliminates the possibility of the upstream LDM skipping data-products
that were created earlier but inserted after the "signature"
data-product.  

The question is "Why wasn't the signature data-product found in idd-ingest's
queue?"

How often does this problem occur?

> Can you tell me what version of the LDM is running on idd.unidata.ucar.edu?

The idd.unidata.ucar.edu backend systems run version 6.4.5 of the LDM.  They're
so well-connected and high-up in the distribution system that they tend not to
have the problems of their less well-connected brethren further down (which is
also why observations like yours are so valuable).

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: JIG-686458
Department: Support LDM
Priority: Normal
Status: On Hold