[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #TFK-703998]: Products not pulled into LDM queue



Justin,

> > Was the LDM on the backup computer, vm-lnx-ncodf1, running and receiving
> > data *before* it was made the main LDM?
> 
> No, it was not

I didn't think so but needed to be sure.

> > One question, though, how do you know they weren't received?
> 
> I didn't see a 'sending' entry in the upstream LDM for those products and I
> didn't see any reference to them in the log on our 'main' LDM or that they
> were sent to our downstream users.

That could be because the system logging daemon (which your LDM appears to be 
using) dropped the log messages. The system logging daemon uses UDP as the 
transport mechanism for log messages and UDP isn't as reliable as TCP. When the 
switch-over occurred, there would have been a massive number of log messages as 
the NOAAPort LDM worked its way through its queue and the backup LDM added the 
products to its queue.

This unreliability is one of the reasons the latest LDM uses its own logging 
mechanism by default. The full rationale is listed in the CHANGE_LOG file 
beginning with version 6.13.

> One of our downstream was the initial
> report that these products were missing which started this investigation.

How did they notice it was missing?

> It seems that our main LDM did pull some products that were up to 30
> minutes old,

By "main LDM" do you mean the one I've been calling the "backup LDM"? When the 
backup is on-line, is it considered the main LDM?

> but it didn't pull nearly all the ones that should have been
> available.
> 
> I've been digging through our logs, here is what I mean.
> 
> For the window of 1910-1919Z (searching for products created between those
> times) our upstream LDM sent *29,480* products to our backup site in
> Boulder (this site was never failed over, none of its systems experienced
> any problems). But only *8,143 *products were sent to our newly started LDM
> in College Park.
> 
> Then going forward and looking at products created between 1930 - 1939Z I
> see* 27,594 *products were sent to Boulder and *28,530* were sent to
> College Park, so by then things were back in sync.
> 
> Any ideas why we would only pull 1/3 of the products when it was starting
> up?

Could be due to the unreliability of the system logging daemon. As I said, a 
switch-over that causes the upstream LDM to send a backlog of products causes a 
burst of log messages when logging is in verbose mode.

> For my knowledge, when we issue a 'pqmon' command and it gives the age
> value, I'm assuming that that is the age of the oldest product and it's the
> next one to be purged from the queue, is that accurate?

Spot on.

> Or is it possible
> that newer products are removed before that older one based on other
> factors, maybe their size?

When a product is accessed (e.g., to be transmitted) the product-queue library 
locks it. Only unlocked products can be deleted. When space is needed in order 
to add a new product, the library deletes unlocked products starting at the 
oldest end of the queue and working its way towards the newest end until 
sufficient space has been recovered.

> Thanks again for helping us with this.

Not a problem.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: TFK-703998
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.