[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IDD #PHO-979790]: Different latencies for different ncep-ldm backends?



Hi Pete,

re:
> I'm not sure if this is a problem, it doesn't seem to affect me getting
> data, but it really blows up the idd stats.
> 
> Frequently when I stop and restart the ldm on idd.aos.wisc.edu (to add a
> new ALLOW statement, for example) when I reconnect, I end up on (I
> think) a different back-end on a cluster and the latencies get much much
> larger.

I can see how this might happen sometimes, but not frequently.  The real servers
(data backends) in our toplevel IDD cluster, idd.unidata.ucar.edu, are 
configured
to receive the exact same data from two upstream "collector" machines.  The
products that are in LDM queues on individual cluster nodes should, therefore,
be pretty much the same.  The exception to this rule, however, is when one
or more products get "stuck" in an LDM queue.  This typically happens when
that node is feeding the "stuck" data to a downstream that has a slow 
connection.
This situation should clear itself when the backlogged products are eventually
sent, or when the connection to the slow downstream (slow downstream connection)
breaks.

The other thing that seems strange right off is that the configuration of
the director of our IDD relay cluster is such that all downstream connections
will get sent to the same real server as long as one or more connections
from the downstream to the cluster are active or if the (re)connection attempt
comes within one minute of all connections being broken.  The only way that
I would think that your (re)connection attempts would be sent to a different
real server is if you shutdown your LDM; made the ALLOW changes in 
~ldm/etc/ldmd.conf;
and then restarted your LDM. Editing of ldmd.conf and then doing a restart
should take on the order of a few seconds, so your (re)connections should be
sent to the same real server.

re:
> On the attached gif image, I had shut down the ldm yesterday afternoon
> about 21:45 UTC to add an allow [then I got distracted and forgot to
> restart until I got home at 23:15 UTC. Oops.. That's a different story
> though!] Up until that point, it appears that most of my data is coming
> via ncep-ldm1_v_ncepldm1.woc.noaa.gov with very low latencies.

OK, interesting.  My question to you is why you don't make your changes
to ldmd.conf and then restart the LDM to make the changes active?  This
is what we do...

re:
> When I restarted at 23:15 UTC, some of my data seems to be coming in via
> ncep-ldm0.ncep.boulder_v_ncepldm4.woc.noaa.gov with latencies up to an
> hour at times, while other data coming in via other routes has low
> latencies.

Are you referring to data in the same IDD feedtype, or are you referring
to data in different feedtypes? If the latter, then I could easily
understand how some data might have high latencies and some low latencies.
The other possibility is that the products with high latencies are somehow
being recirculated in the IDD... meaning that you received it/them once,
they were scoured out of your LDM queue, and then you received it/them
again.  The latency for the first reception would be small, and the latency
for the second would be high.

re:
> Any idea what is causing this, or whether it is something to worry about
> more than just blowing the scales on the plot?

It would be hard to say that there is nothing/something to worry
about without knowing how many products are causing the spike in latencies.
My guess is that the effect you noticed had nothing to do with
your LDM restart, rather it had to do with high latencies from a particular
NCEP server in Boulder starting around the time you restarted your LDM:

http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+uni16.unidata.ucar.edu
http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+uni18.unidata.ucar.edu
http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+uni20.unidata.ucar.edu

Why these machines were getting the high latency data from ncep-ldm0 is a 
mystery
at the moment...

re:
> A side note - there are no times in the title of the plots or scales or
> labels on the X axis.. Seemed like there used to be. You can pretty much
> tell the 6 hourly frequency where more data comes in and the lag
> increases, but the labels were useful :)

We have not been able to figure out why the X-axis labels sometimes get
plotted and sometimes do not.  The code that is running has not changed
in quite awhile, so it must be some sort of a timing issue (e.g., the
GIF being created before the GEMPAK routine doing the plotting is 100%
finished).

Wrap-up: I don't think you have anything to worry about.  We, on the other
hand, do -- why is ncep-ldm0 having problems?


Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: PHO-979790
Department: Support IDD
Priority: Normal
Status: Closed