[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #LFI-210024]: LDM ingesting some products ~30 minutes late and missing others.



Gregg,

A missed data-product means that products of the requested type were going into 
the upstream's product-queue faster than the upstream LDM process could relay 
them to its downstream counterpart. Thus, the controlling factor is the 
effective throughput from the upstream LDM to the downstream LDM. Because an 
upstream LDM process will send data as fast as the computer and TCP connection 
will allow, we must look elsewhere for limitations on throughput. Possibilities 
include the following:

    1. The upstream and/or downstream LDM process isn't getting the CPU 
resources that it needs. In particular, the LDM could be running in a 
resource-starved virtual machine.
    2. The upstream and/or downstream LDM process isn't getting the network 
resources that it needs. Possibilities include
        A. Contention for the network card from multiple processes and/or 
virtual machines
        B. Network congestion. The likelihood of packet collision is increased 
by
            i.  Increasing distance between the sender and receiver
            ii. Increasing the number of nodes on the network
        C. Insufficient bandwidth due to
            i.   A & B above (especially B coupled with TCP's 
congestion-backoff strategy)
            ii.  Bandwidth throttling by a firewall or intrusion detection 
system
            iii. Network cards, switches, and/or routers with insufficient 
bandwidth
    3. Trying to push too much down a single TCP connection. (This is where 
multiple REQUEST entries come in.) It turns out that, given TCP's 
transmission-backoff strategy when congestion is encountered, it's often the 
case that a single TCP connection will have lower effective throughput than if 
the data-stream were broken into multiple, disjoint TCP connections.

If the LDMs in question are reporting rtstats(1) someplace, then you might be 
able to use that information to determine where latency is being introduced.

The network administrators should be able to tell you how loaded the network 
interfaces are.

Increasing the queue size of the upstream LDM *might* cause fewer products to 
be missed -- but if those products are being inserted faster than they can be 
relayed, then you're only postponing the inevitable.

I hope this helps.

> Thanks for the additional details.  I have a follow up query, all regarding
> around the NWS IDP MRMS product stream provided via LDM via
> mrms-ldmout.ncep.noaa.gov.
> 
> *Some background:*
> I have reviewed the following LDM Unidata pages:
> https://www.unidata.ucar.edu/software/ldm/ldm-current/troubleshooting/networkTrouble.html
> https://www.unidata.ucar.edu/software/ldm/ldm-current/troubleshooting/reclassDoc.html
> 
> I do NOT see any "RECLASS" or "skipped" entries in the SPC ldmd.log files.
> 
> XXX has been experiencing delayed/missing MRMS grib2 products from the LDM
> feed and I inquired to the XXX XXX support and provided the ldmd.conf and
> pqact.conf files (also attached).  XXX from XXX did notice there were
> multiple REQUEST lines to mrms.XXX.XXX.XXX but suggested I have a
> REQUEST line for each product type XXX wants to obtain and XXX does this on
> the XXX system.  So as a test, I added a product of
> interest that we were missing/ingesting-greatly-delayed to the ldmd.conf
> file.  After doing this the particular product started coming in timely.
> 
> I went upstairs to talk to XXX (XXX) to discuss the multiple
> REQUEST lines.  XXX said ingesting level-2 radar data from the IRADS
> top-tier provider only required one REQUEST line.  However, to ingest
> level-2 radar data from the XXX server required multiple REQUEST line
> entries.  XXX has also had to have multiple REQUEST lines for the MRMS
> feed from XXX.  However, even with multiple REQUEST lines to the
> XXX she still has dropped products, however when she ingests the same
> product from the "developmental" (i.e. non-operational version) XXX
> XXX system where she is the only LDM customer she ingests all the products
> during a 24 hour period.  Yes, on the operational XXX with multiple
> LDM customers both XXX and XXX are seeing dropped products and greatly
> delayed products and at this point in time it seems XXX is suggesting to
> LDM customers to just add a REQUEST line for each product type.
> 
> The XXX network path is on a XXX 1 Gbps connection back to XXX and there
> is no interaction with XXX.  Since XXX is at XXX she traverses a
> different path that I XXX does to reach the XXX MRMS system.
> 
> *QUESTION:*
> It would seem the top tier of Unidata LDM servers do NOT require downstream
> LDM customers (e.g. A university like Univ of Nebraska, Iowa State, etc) to
> have multiple REQUEST lines in ldmd.conf.  My understanding is the UNIDATA
> feed of data is much larger than the XXX MRMS feed of data.
> 
> Do you have any suggestions on how the XXX and XXX can determine why
> multiple REQUEST lines are needed?  Should they be needed?  Do you have any
> suggestions on what the XXX, XXX and XXX can look at to try to eliminate
> the need for multiple REQUEST lines (e.g. network switching configuration,
> network protocol, firewall settings, etc?)?

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: LFI-210024
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.