[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #TFK-703998]: Products not pulled into LDM queue



Hi Justin,

I agree, it seems like your LDM system shouldn't have failed to transmit those 
three products. I have some questions, though.

- What was the duration from when your main LDM started experiencing a problem 
to when the secondary LDM started?

- Are the clocks on the computers for the SBN LDM, main LDM, secondary LDM, and 
downstream user correct?

- Can you send me the entries from the SBN LDM's log file that show when the 
three products were received and the "Starting Up" message from when the 
secondary LDM connected?

- Do you capture LDM metrics on the SBN computer via a crontab(1)-based 
"ldmadmin addmetrics" command? If so, can you send me the plot of the age of 
the oldest product from the command "ldmadmin plotmetrics" for the relevant day 
of the event (use the options "-b YYYYMMDD.hhmmss" and "-e YYYYMMDD.hhmmss" to 
bracket the time interval.

> We need some help understanding how the LDM manages products in the queue
> once it starts up.
> 
> We run a main LDM instance on a RHEL 6 system, LDM version 6.11.6 with a
> queue of 7GB.
> 
> This LDM pulls NEXRAD2 datasets and a feed of SBN data from a NOAAPort
> dish, as well as a handful of other smaller datsets from other NWS offices.
> 
> We have several downstream LDMs that then pull from this main LDM instance
> and usually we see no issues. Yesterday we had a failure of the virtual
> system running our main LDM and that required us to move processing to its
> backup, which is just a second equivalent virtual system in the same
> datacenter. The LDM started fine and was sending data to downstream users
> almost immediately after startup.
> 
> However, today we got a report that during the period of when we shut down
> the LDM on the problematic system and started the LDM on the backup there
> were three products we get from the SBN feed that were never received by
> the downstream LDMs. After some investigation two of the three products
> were never pulled into our main LDM from the upstream SBN/NOAAPort LDM and
> the one that made it into the main LDM was never pulled by the downstream
> user.
> 
> When we set up these LDM systems we kept in mind that it's necessary to
> have a queue large enough to hold data to handle this type of failover.
> Using pqmon we see our upstream SBN/NOAAPort LDM currently has an age of
> 2070 seconds and our main LDM has an age value of 3147 seconds. It's my
> understanding that that is the age of the oldest product in the queue, so
> I'm getting confused as to why we appeared to have missed those three
> products when our LDM was starting up on the new system. They should have
> remained available in the upstream LDM, but it appears they were never
> pulled.
> 
> Here is the current full output from pqmon on our main LDM:
> 
> May 23 15:03:16 pqmon NOTE: nprods nfree  nempty      nbytes  maxprods
> maxfree  minempty    maxext  age
> May 23 15:03:16 pqmon NOTE: 242888     1 1466095  6932152136    298423
> 292   1410560  67850424 3237
> May 23 15:03:16 pqmon NOTE: Exiting
> 
> 
> We do have verbose logging enabled on all our LDM systems, but there is
> nothing obvious to me why the two products were never pulled and the one
> was never sent. My theory is that we flooded our main LDM with NEXRAD2 and
> SBN/NOAAPort products and then some were purged before they were pulled or
> some were not pulled at all. But we have made our LDM queue, at least in
> queue size, large enough to keep at least 30 minutes of data.
> 
> Can you shed some light on why these products were never pulled from our
> upstream LDM?


Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: TFK-703998
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.