Hi Justin, I agree, it seems like your LDM system shouldn't have failed to transmit those three products. I have some questions, though. - What was the duration from when your main LDM started experiencing a problem to when the secondary LDM started? - Are the clocks on the computers for the SBN LDM, main LDM, secondary LDM, and downstream user correct? - Can you send me the entries from the SBN LDM's log file that show when the three products were received and the "Starting Up" message from when the secondary LDM connected? - Do you capture LDM metrics on the SBN computer via a crontab(1)-based "ldmadmin addmetrics" command? If so, can you send me the plot of the age of the oldest product from the command "ldmadmin plotmetrics" for the relevant day of the event (use the options "-b YYYYMMDD.hhmmss" and "-e YYYYMMDD.hhmmss" to bracket the time interval. > We need some help understanding how the LDM manages products in the queue > once it starts up. > > We run a main LDM instance on a RHEL 6 system, LDM version 6.11.6 with a > queue of 7GB. > > This LDM pulls NEXRAD2 datasets and a feed of SBN data from a NOAAPort > dish, as well as a handful of other smaller datsets from other NWS offices. > > We have several downstream LDMs that then pull from this main LDM instance > and usually we see no issues. Yesterday we had a failure of the virtual > system running our main LDM and that required us to move processing to its > backup, which is just a second equivalent virtual system in the same > datacenter. The LDM started fine and was sending data to downstream users > almost immediately after startup. > > However, today we got a report that during the period of when we shut down > the LDM on the problematic system and started the LDM on the backup there > were three products we get from the SBN feed that were never received by > the downstream LDMs. After some investigation two of the three products > were never pulled into our main LDM from the upstream SBN/NOAAPort LDM and > the one that made it into the main LDM was never pulled by the downstream > user. > > When we set up these LDM systems we kept in mind that it's necessary to > have a queue large enough to hold data to handle this type of failover. > Using pqmon we see our upstream SBN/NOAAPort LDM currently has an age of > 2070 seconds and our main LDM has an age value of 3147 seconds. It's my > understanding that that is the age of the oldest product in the queue, so > I'm getting confused as to why we appeared to have missed those three > products when our LDM was starting up on the new system. They should have > remained available in the upstream LDM, but it appears they were never > pulled. > > Here is the current full output from pqmon on our main LDM: > > May 23 15:03:16 pqmon NOTE: nprods nfree nempty nbytes maxprods > maxfree minempty maxext age > May 23 15:03:16 pqmon NOTE: 242888 1 1466095 6932152136 298423 > 292 1410560 67850424 3237 > May 23 15:03:16 pqmon NOTE: Exiting > > > We do have verbose logging enabled on all our LDM systems, but there is > nothing obvious to me why the two products were never pulled and the one > was never sent. My theory is that we flooded our main LDM with NEXRAD2 and > SBN/NOAAPort products and then some were purged before they were pulled or > some were not pulled at all. But we have made our LDM queue, at least in > queue size, large enough to keep at least 30 minutes of data. > > Can you shed some light on why these products were never pulled from our > upstream LDM? Regards, Steve Emmerson Ticket Details =================== Ticket ID: TFK-703998 Department: Support LDM Priority: Normal Status: Closed =================== NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.