[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[CONDUIT #GSZ-336115]: Large conduit lags have reappeared



Hi Pete and Carissa,

We are seeing consistent 200-800 second latencies on our primary
CONDUIT downstream machine here in Unidata:

http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+daffy.unidata.ucar.edu

Since this machine is not doing any other data ingest, we have been
interpreting the latencies to mean that there is some sort of a
bottleneck in the way that the CONDUIT products are being REQUESTed
** even though we are already doing a 5-way split for the data **:

request CONDUIT "[09]$" ncepldm4.woc.noaa.gov
request CONDUIT "[18]$" ncepldm4.woc.noaa.gov
request CONDUIT "[27]$" ncepldm4.woc.noaa.gov
request CONDUIT "[36]$" ncepldm4.woc.noaa.gov
request CONDUIT "[45]$" ncepldm4.woc.noaa.gov

request CONDUIT "[09]$" conduit.ncep.noaa.gov
request CONDUIT "[18]$" conduit.ncep.noaa.gov
request CONDUIT "[27]$" conduit.ncep.noaa.gov
request CONDUIT "[36]$" conduit.ncep.noaa.gov
request CONDUIT "[45]$" conduit.ncep.noaa.gov

Also, we are unable to get any CONDUIT data from ncepldm4.woc.noaa.gov.
Here is a representative LDM log file output line that illustrates the
problem:

Sep  9 10:52:11 daffy ncepldm4.woc.noaa.gov[2278] ERROR: Disconnecting due to 
LDM failure; Couldn't connect to LDM on ncepldm4.woc.noaa.gov using either port 
388 or portmapper; : RPC: Remote system error - Connection timed out

I interpret this to mean that the LDM on ncepldm4 is not running, or,
if the situation is like it was some time ago, that the LDM has been
re-installed incorrectly (the 'make root-actions' step was omitted
the last time this happened).

Also, as far as the X axis labeling on the rtstats latency plots
goes, this is a GEMPAK routine that Steve Chiswell (Chiz) wrote a
number of years ago, and it sorely needs some fixing up.

Finally, the email exchanges on the address@hidden email list
yesterday prompted us to take a hard look at point-to-point latencies
that we are experiencing.  We found that the front end machines to
the idd.unidata.ucar.edu cluster are _unexpectedly_ introducing
latencies in addition to what we are seeing in the flow from
conduit.ncep.noaa.gov.  We are really scratching our heads over this
one mainly since the added latencies showed up only a few days ago,
and nothing has been changed on our side (with the LDM that is).

Investigations of all problems is underway here at the UPC...

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: GSZ-336115
Department: Support CONDUIT
Priority: Normal
Status: Closed