[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IDD #EEC-117639]: Missing, and slow satellite data



Hi,

re:
> For about a week now we have been missing  multiple channels, and times of
> level 1 Goes 16 and 17 data. Has something changed? 

Over a week ago, we moved idd.unidata.ucar.edu back to a cluster that is
housed in the NCAR-Wyoming Supercomputer Center (NWSC) in Cheyenne, WY.
This email is the first we've heard from any site about high latencies.

I see that freshair1 is redundantly feeding from idd.unidata.ucar.edu
and iddb.unidata.ucar.edu:

https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/siteindex?freshair1.atmos.washington.edu

topology list for DIFAX:

https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_topo_nc?DIFAX+freshair1.atmos.washington.edu

We did not move/change the setup for iddb.unidata.ucar.edu, so I would
expect that the feed latencies from it should not have changed 
recently.  The fact that your latencies are very high suggests that
there may have been some change closer to you (campus, department, ?).

re
> I see that the latency
> for both DIFAX, and NIMAGE are way up, and all over the place, but not so
> for other feeds.

The FNEXRAD latency is very high right now too and so are the latencies
in the NEXRAD2 feed:

FNEXRAD latencies for freshair1:
https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?FNEXRAD+freshair1.atmos.washington.edu

NEXRAD2 latendies for freshair2:
https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?NEXRAD2+freshair1.atmos.washington.edu

This is at the same time that the CONDUIT latencies have been climbing.
These plots suggest that the latencies are correlated with the volume
of data in a feed.  Consider the following snapshot of volumes taken
from a randomly chosen real-server backend of the idd.unidata.ucar.edu
cluster:

https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/rtstats_summary_volume?node7.unidata.ucar.edu

Data Volume Summary for node7.unidata.ucar.edu

Maximum hourly volume 132304.840 M bytes/hour
Average hourly volume  79591.983 M bytes/hour

Average products per hour     550130 prods/hour

Feed                           Average             Maximum     Products
                     (M byte/hour)            (M byte/hour)   number/hour
CONDUIT               15876.831    [ 19.948%]    52229.672    93306.256
SATELLITE             14278.176    [ 17.939%]    20442.923     6289.372
NEXRAD2               12585.120    [ 15.812%]    15088.628   128220.628
NIMAGE                 7652.056    [  9.614%]    12035.722     5817.698
NGRID                  7477.583    [  9.395%]    12546.217    66062.860
NOTHER                 6373.055    [  8.007%]    10156.915    10751.023
FNEXRAD                5063.719    [  6.362%]     5398.921     9911.023
HDS                    3735.609    [  4.693%]     9333.490    29897.767
NEXRAD3                3427.554    [  4.306%]     3985.472   134177.465
FNMOC                  1500.523    [  1.885%]     4975.851     5484.279
UNIWISC                 897.155    [  1.127%]     1118.399      849.605
GEM                     636.047    [  0.799%]     4471.495     3674.023
IDS|DDPLUS               84.774    [  0.107%]       98.752    55307.302
LIGHTNING                 3.619    [  0.005%]        8.096      347.163
GPS                       0.098    [  0.000%]        0.968        1.047
FSL2                      0.065    [  0.000%]        0.558       32.116

As you can see, the SATELLITE (aka DIFAX), NEXRAD2 and NIMAGE feeds are
some of the most voluminous in the IDD.  While CONDUIT, on average, is
more voluminous, you are likely using split REQUESTs to get CONDUIT data
and, as show by you below, single REQUESTs for each of the other feeds where
the latencies are now very high.  This again suggests that the latencies
you are experiencing are a function of the volume of data in a feed.

Question:

- is it possible that per-connection volume limiting was imposed somewhere
  on the UWashington campus?

re:
> https://a.atmos.washington.edu/~ovens/ldmstats.html
> 
> I have this in my ldmd.conf for them:
> request DIFAX                           ".*"    idd.unidata.ucar.edu
> request DIFAX                           ".*"    iddb.unidata.ucar.edu
> request NIMAGE                                  ".*"    idd.unidata.ucar.edu
> request NIMAGE                                  ".*"
> iddb.unidata.ucar.edu
> 
> Do you have any suggestions? 

I suggest two things:

- contact your campus network folks to see if they have implemented
  any per-connection limitations, or if they have installed something
  like Palo Alto that is doing packet inspection and thus slowing things
  down

- try splitting your feed REQUEST for each of the feeds that are 
  showing high latencies

  Another site, Embry Riddle Aeronautical University in FL, was forced
  to split their SATELLITE (aka DIFAX) feed REQUEST into multiple, 
  disjoint REQUESTs about 2 years ago.  Since doing the split, their
  latencies have been kept as low as possible.

re:
> The only thing on our end that has changed is
> that I have a replacement ldm server on our network feeding off our
> existing ones to sync up before using it to replace our current primary.
> Could that have pushed it over the edge. 

I don't think so unless the feeds to the new machine has saturated
your local network.

Question:

- do you know what speeds your local area network run at?

- do you know the speed the Ethernet interfaces on freshair1 and
  freshair2 run at?

  We found that we had to bond two 1 Gbps Ethernet interfaces together
  to increase the through put for machines that used to make up the
  idd.unidata.ucar.edu cluster.  Our newer machines, however, all have
  10 Gbps interfaces, so the need to bond two Ethernet interfaces together
  has not returned.

re:
> However, it's been running for
> more than a week, and no other feeds seem to be having problems.

I think that the latency plots for the NEXRAD2 and FNEXRAD feeds
tell a different story.

re:
> Do you have any suggestions. I have not been able to find any obvious
> system issues.

Please try splitting the NIMAGE feed as a first test to see if
that reduces the NIMAGE latendies.  I suggest a two-way split
where one REQUEST is for GOES-East (GOES16) and the other is
for GOES-West (GOES17) products.  If this change is positive,
do the same thing for your SATELLITE (aka DIFAX) feed REQUEST.

Aside:

The current release of the LDM is v6.13.15.  I see that freshair
is still running v6.13.6 which is why your stats show DIFAX
instead of SATELLITE for the GOES-R/S GRB data.  Upgrading to
the latest LDM will likely _not_ solve your latency problem,
but it may help in other areas.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: EEC-117639
Department: Support IDD
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.