[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IDD #ZAJ-969398]: IDD Data is too old



Hi Elliott,

re:
> The hostname in the registry is "irads-ingest0.net.ou.edu".

OK, that is what we thought.

re:
> In the registry, we have 'time-offset' lowered to 600 seconds. Even with that 
> low of a
> period to backfill, it never catches up. In fact, it gets further behind.

OK.

re:
> Yesterday evening, I tested idd.meteo.psu.edu. The same issue occurred. We 
> were able to
> get a bit better performance by splitting the request into two, but it still 
> doesn't
> perform well.

Can you let us know when you split the feed REQUEST into two?

I'm trying to understand the latency trace that we see in:

http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?NEXRAD3+irads-ingest0.net.ou.edu

This time series shows that the NEXRAD3 latency grew to over 3600 seconds 
sometime between
0 and 1 UTC yesterday, November 5.  It dropped to near zero sometime around 2 
UTC today,
stayed low up until around 14 UTC today and then started getting worse.  For 
some reason,
there doesn't seem to be any latency information for the period of approx. 15 
to 19 UTC
today, and then latency values returned, but they are bouncing between high and 
low
values.  This last behavior makes me think that your LDM configuration file has 
the
same feed REQUEST to both PSU and to somewhere else, and that somewhere else has
content that is totally different than what is available from PSU.  What I am 
looking
at is the light pink (or some such color) near zero latency line that can be 
seen
in the period from 15 to 19 UTC.

Can you send us your LDM configuration file so we can take a look?  If you'd 
rather
not send us the entire file (as an attachment), please send the output of:

grep -i ^request ~ldm/etc/ldmd.conf

A potential problem I am trying to figure out is one caused by replicated feed 
REQUESTs
to different upstreams that have different contents for the datastream(s) being
REQUESTed.  When a situation like this is present, the LDM will preferentially 
get
products from one server over the other, and that will, in turn, mean that one
would not get the products desired from one of the upstreams.  The suspicious 
latencies
are the ones from kfs-mini-01.mesonet.

re:
> While we've been focused on one host, irads-ingest0, at 156.110.246.56, we 
> are seeing the
> same issue from echo-ingestA.services.ou.edu (129.15.2.32). It has been 
> tested against
> idd.unidata.ucar.edu and idd.aos.wisc.edu and behaves the same as 
> irads-ingest0.

The latency plot for NEXRAD3 on echo-ingesta.services.ou.edu is showing the 
exact
same kind of thing as irads-ingest0, and the suspicious latency in the plot is 
also the
one from the connection to kfs-mini-01.mesonet.

Can you send us the LDM configuration file from echo-ingesta?

re:
> To add
> to the issue, both hosts have started having issues receiving the NL2 TDS 
> feed as well.
> 
> http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?NEXRAD3+echo-ingesta.services.ou.edu
> http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?NEXRAD2+echo-ingesta.services.ou.edu

The latency plot for NEXRAD2 for echo-ingesta is even stranger since some of the
sites are coming in with very low latencies while a slug of others are showing
very high latencies.

re:
> The two hosts are in separate networks on separate campuses, managed by 
> different parts of
> the IT department. With that, we have put in a ticket with our ISP, OneNet. 
> They have
> requested traceroutes from our upstream sources to our hosts, if possible, to 
> trace the
> data path.

Another clue: two machines in the NWC (ldmingest01.nwc.ou.edu and 
ldmingest02.nwc.ou.edu)
are also showing higher than previous latencies for CONDUIT starting sometime on
Sunday morning, November 4:

http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+ldmingest01.nwc.ou.edu

Is the NWC in the same building as you?

Do you know who to contact as the LDM administrator of the NWC machines?  We 
really
want to find the contact since we are seeing LDM connections from their machines
breaking and reestablishing connections on more than one backend real-server
machines of our idd.unidata.ucar.edu cluster.

re:
> Would it be possible to get traceroutes from a host behind 
> idd.unidata.ucar.edu to
> 129.15.2.32 and 156.110.246.56?

Here are traceroutes from the real server backend machine that have connections
from each of these IPs:

From uni19.unidata.ucar.edu which is servicing a split NEXRAD3 and an NGRID feed
to echo-ingesta.services.ou.edu/129.15.2.32:

traceroute to 129.15.2.32 (129.15.2.32), 30 hops max, 60 byte packets
 1  flr-n140.unidata.ucar.edu (128.117.140.251)  0.724 ms  0.715 ms  0.854 ms
 2  ml2core-fl2core.unet.ucar.edu (128.117.243.194)  0.958 ms  1.449 ms  1.442 
ms
 3  corel3-ml2core-i2.unet.ucar.edu (128.117.243.141)  1.696 ms  1.692 ms  
1.680 ms
 4  v3454.rtr-chic.frgp.net (192.43.217.222)  23.159 ms  23.166 ms  23.160 ms
 5  et-2-1-0.4079.rtsw.chic.net.internet2.edu (162.252.70.116)  23.742 ms  
23.911 ms  23.729 ms
 6  ae-3.4079.rtsw.kans.net.internet2.edu (162.252.70.141)  34.575 ms  34.530 
ms  34.511 ms
 7  et-7-0-0.4079.rtsw.tuls.net.internet2.edu (162.252.70.35)  38.497 ms  
38.478 ms  38.876 ms
 8  198.71.46.45 (198.71.46.45)  38.520 ms  38.519 ms  38.505 ms
 9  164.58.244.44 (164.58.244.44)  38.564 ms  38.827 ms  38.793 ms
10  164.58.244.15 (164.58.244.15)  40.757 ms  40.768 ms  40.755 ms
11  164.58.245.55 (164.58.245.55)  40.836 ms 164.58.245.53 (164.58.245.53)  
40.924 ms  40.707 ms
12  164.58.245.58 (164.58.245.58)  40.909 ms  40.907 ms 164.58.245.56 
(164.58.245.56)  40.851 ms
13  164.58.244.33 (164.58.244.33)  41.414 ms  41.347 ms  41.386 ms
14  164.58.10.98 (164.58.10.98)  41.467 ms  41.463 ms  41.512 ms
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *

From uni19.unidata.ucar.edu which to delta-ingest0.irads.ou.edu/156.110.246.56:

traceroute to 156.110.246.56 (156.110.246.56), 30 hops max, 60 byte packets
 1  flr-n140.unidata.ucar.edu (128.117.140.251)  0.743 ms  1.081 ms  1.155 ms
 2  ml1core-flacore.unet.ucar.edu (128.117.243.78)  1.249 ms 
fl2core-flacore.unet.ucar.edu (128.117.243.106)  0.839 ms 
ml1core-flacore.unet.ucar.edu (128.117.243.78)  1.230 ms
 3  ml2core-ml1core.unet.ucar.edu (128.117.243.99)  1.094 ms 
ml2core-fl2core.unet.ucar.edu (128.117.243.194)  1.374 ms 
ml2core-ml1core.unet.ucar.edu (128.117.243.99)  1.353 ms
 4  corel3-ml2core-i2.unet.ucar.edu (128.117.243.141)  1.504 ms  1.499 ms  
1.487 ms
 5  v3454.rtr-chic.frgp.net (192.43.217.222)  23.160 ms  23.134 ms  23.136 ms
 6  et-2-1-0.4079.rtsw.chic.net.internet2.edu (162.252.70.116)  23.471 ms  
23.499 ms  23.592 ms
 7  ae-3.4079.rtsw.kans.net.internet2.edu (162.252.70.141)  34.663 ms  34.482 
ms  34.481 ms
 8  et-7-0-0.4079.rtsw.tuls.net.internet2.edu (162.252.70.35)  38.407 ms  
38.386 ms  38.270 ms
 9  198.71.46.45 (198.71.46.45)  38.521 ms  38.490 ms  38.478 ms
10  164.58.244.44 (164.58.244.44)  38.839 ms 164.58.244.46 (164.58.244.46)  
38.513 ms 164.58.244.44 (164.58.244.44)  38.588 ms
11  164.58.244.239 (164.58.244.239)  38.676 ms  38.644 ms  38.643 ms
12  164.58.16.26 (164.58.16.26)  38.481 ms  38.444 ms  38.428 ms
13  156.110.254.97 (156.110.254.97)  41.410 ms  41.195 ms  41.521 ms
14  156.110.254.62 (156.110.254.62)  41.003 ms  40.885 ms 156.110.254.50 
(156.110.254.50)  45.609 ms
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *

NOTE:

- While on uni19.unidata.ucar.edu, I saw that it was feeding lion.caps.ou.edu
  some CONDUIT data, and all of UNIWISC, NOTHER and NGRID.

- the latency plots for NGRID and NOTHER for lion look much the same as
  for your machine and for the NWC machines:

  NOTHER
  
http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?NOTHER+lion.caps.ou.edu

  NGRID
  
http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?NGRID+lion.caps.ou.edu

- also, it is a bit strange that there are two different names associated with
  the 156.110.246.56 IP address:

  ~: nslookup 156.110.246.56
  Server:       208.67.222.222
  Address:      208.67.222.222#53

  Non-authoritative answer:
  56.246.110.156.in-addr.arpa   name = delta-ingest0.irads.ou.edu.
  56.246.110.156.in-addr.arpa   name = delta-ingest0.ou.edu.

  This is a bit strange, but it should have no bearing on the latency issues
  being investigated.

- lastly, I don't see any connections from 
delta-ingest0.irads.ou.edu/156.110.246.56
  on any of the real server backend machines that comprise the 
idd.unidata.ucar.edu
  cluster.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: ZAJ-969398
Department: Support IDD
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.