Art, > I'm having trouble understanding this. If the controlling factor is a > combination of the local queue size and the requested offset "from" time, > why do I only see the problem when switching to idd.cise-nsf.gov? > Wouldn't you expect to see the jump switching to iddrs3 as well? Also, > the jump doesn't occur every time but rather seems to occur randomly... > what would produce such an inconsistent result? Does the remote queue > size come into play here? In a distributed, asynchronous system, things can get complicated very quickly. Figuring out all the details can take time. My hypothesis for the large latency spikes is that the upstream LDM at Cise-Nsf that's feeding Iddrs3 is working on the oldest products in its queue rather than the newest ((the cause of this might be the poor connection between PSU and NSF). As a consequence, when the corresponding downstream LDM on Iddrs3 decides to switch to primary-mode, its request in the new connection specifies a last-received data-product that doesn't exist in the Cise-Nsf queue. As a consequence, the new Cise-Nsf upstream LDM uses the "from" time in the request. Because this time is greater than the oldest product in the Cise-Nsf queue, the upstream LDM starts sending products beginning with the oldest -- and those products are no longer in the Iddrs3 queue. Hence, the large latency spikes. The solution is to ensure that the "from" time in a data request is never older than the oldest product in the local queue; otherwise, duplicate product rejection can't occur. > The queue size is 8 GB on both iddrs3 and idd-ingest. I monitor the > oldest products on our ls3 system which also has an 8 GB queue and it gets > down to around 2000 seconds, so as you say, raising the queue size would > be the only option. However, I only have 8 GB of memory in iddrs3 and > raising the queue size would make me concerned about the potential for a > thrashing situation should requests be made for older products on disk. We've always recommended that the LDM computer have sufficient physical memory to hold the product-queue in memory. Recently, however, we've had success with queues that are significantly larger than physical memory. Your mileage may vary (there's no way to tell at present). If you have gnuplot(1) installed, then the "ldmadmin addmetrics" and "ldmadmin plotmetrics" commands are a good way of monitoring the age of the oldest product in the queue. > I've changed the offset time to 2000 seconds temporarily on iddrs3 as a > test to see if the problem goes away. Is there any way to fix this in the > LDM so it's not sensitive to the oldest queue boundary, or is the > algorithm too complex? Other than the solution I've mentioned, there's no easy fix for this. The downstream site must have a record of received data-products and, right now, that record is the product-queue. I'll think about some more complicated solutions, of course, but I can't say when (or even if) they'll happen. > Thanks... > > Art > > Arthur A. Person > Research Assistant, System Administrator > Penn State Department of Meteorology > email: address@hidden, phone: 814-863-1563 Regards, Steve Emmerson Ticket Details =================== Ticket ID: VAP-368514 Department: Support LDM Priority: Normal Status: Closed
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.