[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030905: LDM latencies at FSU of late



>From: Paul Ruscher <address@hidden>
>Organization: FSU
>Keywords: 200309051228.h85CSJLd021596 IDD latency

Hi Paul,

>Hi, all - we are trying to get a handle on what appears to be a local 
>problem, but I'd appreciate your perspective.  Overnight several nights 
>recently and into the morning, we are losing HDS and IDS|DDPLUS 
>products, with latencies yesterday morning exceeding 2 hours, and this 
>morning approaching one hour.

Immediately after seeing your note first thing this morning, I plotted
the latencies for IDS|DDPLUS and HDS for pluto:

IDS|DDPLUS:

http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?IDS|DDPLUS+pluto.met.fsu.edu

HDS:

http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?HDS+pluto.met.fsu.edu

Since the traces for the IDS|DDPLUS and HDS streams are identical, I
will have to assume that the request you have for this data looks
like:

request WMO     ".*"    upstream_host

If this is true, the first thing you might try is splitting IDS|DDPLUS
off of the HDS request:

replace:

request WMO     ".*"    upstream_host

with:

request IDS|DDPLUS      ".*"    upstream_host
request HDS     ".*"    upstream_host

Before jumping to this, please read on.

>NNEXRAD and imagery are not affected.  

The HDS stream contains over 400 MB of data per hour:

http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_vol_nc?HDS+pluto.met.fsu.edu

Your NNEXRAD feed contains significantly less than this amount:

http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_vol_nc?NNEXRAD+pluto.met.fsu.edu

The other thing is that you are getting your NNEXRAD feed from OU,
not UIUC.

>UIUC is feeding us, and their other downstream nodes are fine.

I just took a look at the sites that squall.atmos.uiuc.edu is
feeding, and I see that they are:

aeolus.valpo.edu
zelgadis.geol.iastate.edu
pluto.met.fsu.edu

Now, both aeolus and zelgadis are redundantly feeding from other LDM's:

IDS|DDPLUS topology for aeolus.valpo.edu:

http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_topo_nc?IDS|DDPLUS+aeolus.valpo.edu

IDS|DDPLUS topology for zelgadis.geol.iastate.edu:

http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_topo_nc?IDS|DDPLUS+zelgadis.geol.iastate.edu

Pluto, on the other hand, is currently only feeding IDS|DDPLUS from squall:

http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_vol_nc?IDS|DDPLUS+pluto.met.fsu.edu
http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_topo_nc?IDS|DDPLUS+pluto.met.fsu.edu

(the volume plot shows that the data is only coming from squall even
though the topology list shows both squall and idd.nrcc.cornell.edu).

So, the point is that you may be seeing a problem being experienced on
squall, or you may be seeing a problem closer to home.  In order
to see which may be the case, please add a new request line for WMO
to atm.geo.nsf.gov:

if your request is currently:

request WMO     ".*"    squall.atmos.uiuc.edu

change it to:

request WMO     ".*"    squall.atmos.uiuc.edu PRIMARY
request WMO     ".*"    atm.geo.nsf.gov ALTERNATE

The unfortunate thing as far as I am concerned is that squall is not
reporting real time statistics, so we can not pinpoint where your
problem may be originating.

>We did 
>find a rather large core file this morning on pluto.met.fsu.edu which 
>has been removed.

Interesting.  What process was the core file from?

>I fear what is happening is something on a router 
>setting here is somehow not letting these things through on the 
>Internet2 side of our network.

If the problem is close to home and not at squall, this is a possibility.

>Is there something from your side that 
>we can tell our network engineers here to "open the throttle" or 
>whatever it is we need to tell them?

In case the FSU network folks are using some sort of packet shaping
software, I would ask them to allow port 388 traffic to flow with
no restrictions.

>We've had numerous outages in the 
>past three weeks due to a variety of upstream host problems, and other 
>issues, but the latest all seem to be related to this kind of 
>thing...the loss of HDS and IDS|DDPLUS with graphics products remain ok.

First we really need to understand exaclty where the IDS|DDPLUS & HDS
slowdowns are being caused.  Adding a redundant request to
atm.geo.nsf.gov should help us understand this better.  I will also
blast off a note to UIUC requesting them to add real time statistics
reporting on squall.

>I'm suspecting folks here have tweaked what they are letting in to 
>favor large file sizes of lots of smaller stuff...but I don't know 
>networking...Thanks if you can help!  Paul

Firewalls and packet shaping can be configured in a wide variety of
ways.  One of the most common is for campus network groups to limit
flow on individual connections or to specific ports or from specific
machines.  It has been our experience that most network folks are
willing to remove constraints on LDM traffic after they are made aware
of its strictly scientific use.

In wrap up, please try adding the redundant request to atm, and let's
see what happens.

Tom

>From address@hidden Mon Sep  8 20:44:29 2003

Thanks for the response, Tom - I'm passing this along to Bret and Bill 
on our side so that perhaps we can try some new things on pluto.  I 
appreciate the advice.  Paul