[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20031001: upgrade of LDM on graupl (cont.)



>From: "Frank Colby" <address@hidden>
>Organization: UMass Lowell
>Keywords: 200309301348.h8UDmhk1023988 IDD LDM upgrade

Hi Frank,

>Thank you for making these changes on graupl.

Glad to help out.

>The clock thing has
>bothered me for some time, but for some reason, I had trouble getting
>that started again after an update a couple of years ago.  The imagery
>stuff is something I want to get going, but again, haven't felt the need
>to put it high on the list.  Having the ingest lines ready to go will
>help a lot.

OK, I was hoping so.  I think that realtime satellite and radar
imagery is important for education, and it certainly is nice to look
at when things like hurricanes come through :-)

>As for the bandwidth problem, yes we had to "instruct" our network folks
>last fall, using the Unidata help screen for packetshaper, and that
>worked for the rest of the year.  I had a discussion with a network
>person last week about our trouble this fall, and he insisted that the
>packetshaper fixes were still in place.

OK.  This comment is a little inconsistent with the latency plots
we are seeing on graupl.  For instance, since the 6.0.14 installatioun
and ldmd.conf request "tuning", the latency on all IDS|DDPLUS
(observations) data has been very close to zero:

http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?IDS|DDPLUS+graupl.uml.edu

The latency for low volume datastreams like NLDN lightning has also been
pretty low:

http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?NLDN+graupl.uml.edu

The latency for the HDS data (model output) was very high until 9
or 10 Z this morning when it dropped to low values:

http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?HDS+graupl.uml.edu

We need to keep an eye on these plots to see if things stay on the
good side, or if a volume-related slowness persists.

>This actually makes sense,
>since the errors we got last fall were different from our trouble this
>fall.

The real time statistics plots are very useful in keeping an eye on
how well you are doing.  I recommend following visiting the page
frequently to get familiar with what they are telling:

http://www.unidata.ucar.edu/staff/chiz/rtstats/siteindex.shtml

then click on the graupl.uml.edu link to get

http://www.unidata.ucar.edu/staff/chiz/rtstats/siteindex.shtml?graupl.uml.edu

and look at traces of latency, volume and products (number of products)
for graupl.  To get an idea of how well the data is coming from your
upstream feeder, click on the topology link and then click on the name
of your immediate upstream feed site(s).  This will produce a
differential latency plot, the time it took for the product to get from
that upstream feed site to you (i.e., their latency removed).  All of
these plots represents samples of the latency/volume/#products, so
don't be suprised when you see differential latencies that are a little
less than zero.

>Anyway, the network guy said the real problem is that many (100s)
>of machines had been infected with the latests viruses and worms in
>mid-August, (due in part to an inexplicable stupidity on the part of
>users to continue to open attachements they weren't expecting, and not
>updating their operating systems).

If it is any consolation, this happened at a number of institutions.

>The result is that the infected pcs
>start pinging to look for new exploits, and they run through the entire
>gamut of ip addresses, starting at 1.1.1.1, then 1.1.1.2, etc.  The
>router in our building has to wait 2 seconds for a "no machine at that
>address" response, and in the meantime, the router memory fills up and
>overflows, with the resulting loss of packets.  He indicated that the
>routers in our building are running so close to 100% of capacity that he
>cannot even do diagnostics, but instead restart the router and quickly
>log in before it becomes saturated again.

This does sound consistent with what we are seeing.

>They are pursuing two avenues
>-- 1) replacing the routers with newer, quicker ones with more memory,
>and 2) cleaning up the various pcs.

Cleaning up the PCs should be a number one priority.

>So far, we haven't seen any
>improvement, but he indicated he would try to move our building to the
>top of the "new router" list.  Does this explanation make sense to you?

Yes, it does.

>Thanks again for your help!

No worries.  I am glad that graupl is now reporting real time statistics
so we can keep an eye on you.

Cheers,

Tom