Re: IDD Delays: Latency vs. Bandwidth

Russ,

Thanks for the enlightening discussion.  If I understand things correctly,
there is an RPC call for _every_ product sent down the line.  While this
makes sense for large products, it seems like we are essentially doubling
the communication overhead for the smaller products.  Is there a way that
the data can be held at the upstream host (maybe at the root host even),
disseminated in larger bundles to the downstream host, where the data
stream could be unbundled?  The bundles could be defined either in terms of
number of products or duration of time... where the later would seem to
make more sense.

As the data line becomes more congested with the NNEXRAD feed, this is only
going to get worse.  Anyway, thanks again for the useful information, I
appreciate it.

-Tim


At 10:08 AM 11/16/00 -0700, Russ Rew wrote:
>Hi,
>
>Thanks to Jim Koermer, Tom McDermott, and Tim Doggett for raising some
>important issues and providing a clear analysis of some of the causes.
>A few additional observations about bandwidth versus latency might be
>useful for troubleshooting and configuring LDM sites for the IDD.
>
>First, a clarification: what we're calling the "FOS data stream" in
>this discussion comes from the NOAAPORT NWSTG channel.  When they are
>injected into the IDD, NOAAPORT text products are currently tagged
>with the "IDS|DDPLUS" feed type and binary products are tagged with
>the "HDS" feedtype.  There are lots more of these products than were
>on the old IDS|DDPLUS|HDS feeds in the Family of Services.
>
>The main point I want to make is that the number of products per time
>interval may be more important than the volume of the products as a
>cause for delays.  Another way of saying this is that the network
>latency may be more important than the aggregate bandwidth for a
>network connection in determining IDD delays.
>
>Sending each product always require at least one remote procedure call
>(RPC, a round trip network transaction) from the upstream to the
>downstream site, so the rate at which even small products can be sent
>from one host to another is limited by the maximum number of RPCs per
>second over the network connection between the hosts.  The time for a
>single RPC call is what ldmping measures, and this is close to the
>time required to send a single small product.  So you can determine a
>maximum for how many products per second a downstream host can accept
>from your LDM by taking the reciprocal of the ldmping time to that
>host (ignoring the first few times ldmping reports, letting it settle
>down to a steady state).  Similarly, the reciprocal of the ldmping
>time from an upstream site is a good approximation for how many
>products per second you can get from that site.
>
>During some hours the rate for FOS products can be as high as 5.4
>products/second, though the long-term average is about 3.1
>products/second.
>
>If we had been feeding FOS products to Jim Koermer's LDM during one of
>the times when it was experiencing high latency, ldmping indicates it
>would only have been able to handle about 5 product/second:
>
> test$ ldmping -h mammatus.plymouth.edu -i 1
> Nov 15 15:42:55      State    Elapsed Port   Remote_Host           rpc_stat
> Nov 15 15:42:56 RESPONDING   1.204837 4677   mammatus.plymouth.edu  
> Nov 15 15:42:57 RESPONDING   0.241447 4677   mammatus.plymouth.edu  
> Nov 15 15:42:58 RESPONDING   0.222650 4677   mammatus.plymouth.edu  
> Nov 15 15:42:59 RESPONDING   0.228247 4677   mammatus.plymouth.edu  
> Nov 15 15:43:01 RESPONDING   0.212776 4677   mammatus.plymouth.edu  
> Nov 15 15:43:02 RESPONDING   0.204985 4677   mammatus.plymouth.edu  
> ...
>
>This shows that each RPC call takes about 0.2 seconds, so only 1/.2 or
>about 5 products per second can be received.  Later in the same
>afternoon, the ldmping times climbed even higher, to about .35 seconds
>per RPC call, so at this point it could only keep up with about 3
>products per second.
>
>When the RPC rate is less than the rate at which products are injected
>into the data stream, products back up at the upstream sender
>process, until it ultimately gets a RECLASS (a message from the
>downstream host indicating the offered products are too old) and jumps
>to the start of the queue to send current data, dropping the
>intervening products.
>
>Other sites that don't see such high product latencies typically have
>much smaller ldmping times, for example the upstream site from
>Plymouth:
>
> test$ ldmping -h squall.atmos.uiuc.edu -i 1
> Nov 15 17:07:41      State    Elapsed Port   Remote_Host           rpc_stat
> Nov 15 17:07:41 RESPONDING   0.099968  388   squall.atmos.uiuc.edu  
> Nov 15 17:07:42 RESPONDING   0.030012  388   squall.atmos.uiuc.edu  
> Nov 15 17:07:43 RESPONDING   0.029179  388   squall.atmos.uiuc.edu  
> Nov 15 17:07:44 RESPONDING   0.029559  388   squall.atmos.uiuc.edu  
> Nov 15 17:07:45 RESPONDING   0.030265  388   squall.atmos.uiuc.edu  
> ...
>
>which means an RPC call to this host takes about .03 secs, so it can
>accept about 33 products per second.
>
>These example network latencies are measured from here rather than
>from the upstream IDD host, but they are probably representative.  It
>would be instructive for most sites to get an ldmping log from their
>upstream site for a 24 hour period, to see how network latencies vary.
>Using "ldmping -i 5 -h hostname" would give latencies every 5 seconds,
>and a 24-hour log would take about 1 MB of disk.  Latencies vary
>widely, so if if the current latency is low but was high during the
>previous hour, LDM products from an upstream host may still be
>arriving late because it takes time to catch up with a backlog of
>products.
>
>Unfortunately, network latencies are not necessarily symmetric, so
>running ldmping from a downstream host to an upstream LDM won't always
>give a good approximation of the network latency in the other
>direction.
>
>This RPC latency as measured by ldmping may be the limiting factor for
>many sites, rather than the volume of the data.  Here's some recent
>maximum hourly product rates for common feed types:
>
> Feed type     prods/sec
>
> WSI           6.7 (all products, only distributed from WSI)
> NMC2         6.2 (CONDUIT model data, limited distribution)
> HDS           3.8
> IDS|DDPLUS    2.3 
> NNEXRAD       1.7 (NOAAPORT NEXRAD, available unencrypted in 2001)
>
>Some of these rates can vary significantly at different times of the
>day.  For example, the HDS rate varied from 0.3 to 3.8 products/second
>during different hours of this period.  This means adding these gives
>worst case rates, since the high rates for different feeds may occur
>at different times.  For example, you might think from the above that
>the worst case for FOS is obtained by adding HDS and IDS|DDPLUS rates
>to get 6.1 products/second, but the highest rate for the combined feed
>we have seen recently is only 5.4 products/second.  Nevertheless, if
>the ldmping time from your upstream site is greater than about 1/6.1
>or 0.16 seconds, you might not be able to keep up with the FOS data,
>even if that is the only data you are getting.  Over brief intervals
>data products can come in at much higher rates; we have occasionally
>seen over 180 products/second on motherlode.ucar.edu.
>
>By comparison, the MCIDAS data stream sends a maximum of about .005
>products/second, so it is not a factor in these latency calculations
>even though it contains large products.
>
>So, what can a site do if their latency indicates they can't receive
>products as fast as they are injected into the IDD?  
>
>First, you can try to determine the cause of high latencies and
>correct them, using ldmping as a measuring tool to evaluate proposed
>improvements.
>
>Second, you can request less data from upstream nodes.  Eliminating a
>high-rate feed by not requesting that feed type is the best way to do
>this, but if you're relaying data to downstream sites, you can't
>eliminate a feed that downstream LDM's need.  If you can get agreement
>from your downstream sites and any sites that might failover to you to
>eliminate a feedtype, that might help you and your downstream sites.
>As Tom McDermott pointed out, if you're a leaf node you have the
>freedom to request just the subset of data you need.  And you can use
>patterns within feedtypes in your ldmd.conf configuration file to
>request a subset of the data products in a feed.
>
>Finally, I should point out that the rate for NNEXRAD shown above (an
>additional 1.7 products/second) may increase as more products are
>added to the space made available by compressing products after
>January.  We're currently trying to evaluate the effect the imminent
>introduction of the NNEXRAD feed will have on relay sites and the IDD.
>
>--Russ
>
>_____________________________________________________________________
>
>Russ Rew                                         UCAR Unidata Program
>russ@xxxxxxxxxxxxxxxx                     http://www.unidata.ucar.edu
>

---------------------------------------------------------------------------
Tim Doggett
Assistant Professor
Texas Tech University
Atmospheric Science Group, Department of Geosciences
Phone: (806)742-3477
Fax: (806)742-1738
---------------------------------------------------------------------------

  • 2000 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the ldm-users archives: