[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[CONDUIT #PNI-665795]: 20150331: high CONDUIT latencies



Hi Carissa,

I apologize for the slow reply to your questions below; we have had a day
chock full of meetings today (sigh)...

re: chatter on the address@hidden email list regarding high
product latencies

> I have seen the threads. So let me make sure I understand everything --
> 
> First, what is the difference between these machines?
> 
> vm-lnx-conduit1.ncep.noaa.gov_v_conduit.ncep.noaa.gov
> vm-lnx-conduit1.ncep.noaa.gov_v_idd.unidata.ucar.edu
> vm-lnx-conduit1.ncep.noaa.gov_v_idd.aos.wisc.edu

The labels at the top of the real-time stats plots show the end points
for a data flow path.  For instance:

vm-lnx-conduit1.ncep.noaa.gov_v_idd.unidata.ucar.edu

indicates that products created on vm-lnx-conduit1.ncep.noaa.gov eventually
made their way to idd.unidata.ucar.edu.  This label does not show the
path by which the products went from the machine on which they were
created to the machine on which they were received.

NB: I am using the LDM meaning for the term 'created'.  The latency
referred to in the email list chatter is the time difference between
when a product was received and when its creation time. The creation
time of a product is the time at which it was inserted into an LDM
queue.

re:
> Is the first one directly from us?

They all indicate products from you.  Whether or not they are directly
from you would knowing if the receiving machine was REQUESTing from you
or if it was REQUESTing from one or more machines that were REQUESTing
from one or more machines, etc. one or more of which was REQUESTing
directly from you.  Here are two different cases that should illustrate
the potential data flow:

vm-lnx-conduit1.ncep.noaa.gov_v_idd.aos.wisc.edu

idd.aos.wisc.edu is REQUESTing CONDUIT data redundantly from at least
two top level CONDUIT relays, conduit.ncep.noaa.gov and ncepldm4.woc.noaa.gov
(I say at least, because Pete Pokrandt, the LDM/IDD admin of idd.aos.wisc.edu,
told us that he was redundantly REQUESTing from the two top level NCEP
sites originally, but then he added a redundant REQUEST to idd.unidata.ucar.edu
after he started experiencing very high latencies for his CONDUIT data

vm-lnx-conduit1.ncep.noaa.gov_v_idd.unidata.ucar.edu

This is an example of being two steps removed:  the two accumulators
for the idd.unidata.ucar top level relay cluster REQUEST data redundantly
from our primary CONDUIT ingest machine, daffy.unidata.ucar.edu, and
from two other top level IDD relays, idd.aos.wisc.edu and idd.meteo.psu.edu.

re:
> And the others are relayed through you?

idd.unidata.ucar.edu relays CONDUIT data to a variety of downstream sites
as does idd.aos.wisc.edu and idd.meteo.psu.edu.

re:
> I'm not sure yet that I am able to tell who is pulling from who off these
> graphs. Clearing up the naming convention will help.

The easiest way to figure out who is making direct REQUESTs from NCEP
top level sites is to grep for 'topo' statements in the LDM log files
on each of the machines that comprise conduit.ncep.noaa.gov and
ncepldm4.woc.noaa.gov.  The other way to figure out where products are
being received from is to look at the 'topology' link in the real-time
stats page for a particular machine.  For instance, to see where CONDUIT
data came from that is making it to idd.aos.wisc.edu, you would look
in:

Unidata HomePage
http://www.unidata.ucar.edu

  Data -> IDD Operational Status
  http://rtstats.unidata.ucar.edu/rtstats/

    Statistics by Host
    http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/siteindex

      idd.aos.wisc.edu [6.12.6]
      http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/siteindex?idd.aos.wisc.edu

Clicking on the CONDUIT topology link in the last page should show all of the 
paths
taken by data being received by idd.aos.wisc.edu.

When I did this, I found:

0 idd.aos.wisc.edu [source]

    1 idd.uni [no stats] 

    1 ncepldm [no stats] 

Comments:

- there is a problem in that the full name of the upstream is not being
  shown; I will investigate this problem

- idd.aos.wisc.edu is REQUESTing data from idd.unidata.ucar.edu and
  ncepldm4.woc.noaa.gov

  I told Pete yesterday that he should be REQUESTing directly from
  conduit.ncep.noaa.gov, so he should re-instate the REQUEST(s) that
  he commented out when he started seeing latency problems.

re:
> Looking at PSUs graph I see the jump of blue color is very clearly defined,
> what time/day was that?

The plot represents the about 2 days of time.  I will need to investigate
why the labels are not being plotted.

re:
> Was that when they turned that feed on?

No.  What is going on is that products are being received from more
than one path.  The ones with the _very_ high latencies (the ones being
plotted in blue today) seem to be ones that were likely received from
a different upstream LDM quite a bit before they were received again
from the upstream showing the high latencies (the upstream for the
high latencies today is ncep-ldm0.ncep.boulder.

re:
> I can't tell if there is a blue line previous to that jump.

We have observed that when a site has more than one upstream, and
the latencies for one upstream are "low" (a non-specific value, but
not very high) and the latencies for the other feed are "high", then
there may be a recirculation of products OR that something in the
network connection to the slow upstream is amiss (e.g., bad router,
there is packet shaping going on, etc.).  The extreme difference
in latencies between the two different data paths shown in the PSU
graph when coupled by similar latencies for other sites that are
feeding directly from ncepldm4.woc.noaa.gov strongly indicate that
there is a problem at or close to ncepldm4.woc.noaa.gov.

Please let us know if you would need more information on the above.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: PNI-665795
Department: Support CONDUIT
Priority: Normal
Status: Closed