[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20060105: reboot of yakov leads to high latencies from ECMWF (cont.)



>From:  Mike Schmidt <address@hidden>
>Organization:  UCAR/Unidata
>Keywords:  200601051054.k05AsGP1017802 TIGGE IDD latencies FC4 system tuning

Hi Mike,

re:
>Sorry for the delay getting back to you.

No worries.

(Just so you know, a number of my comments below are "for the files".)

yakov is currently running Fedora Core 4 64-bit.

uname -a
Linux yakov.unidata.ucar.edu 2.6.14-1.1653_FC4smp #1 SMP Tue Dec 13 21:55:55 
EST 2005 x86_64 x86_64 x86_64 GNU/Linux

It is a dual 3.2 Ghz Intel Xeon EM64T platform with 4 GB of RAM.  FC4
recognizes the Xeon hyperthreading capabilities and configures itself
as if there are 4 CPUs.

>Here are the TCP tuning parameters for yakov;
>
># echo 2500000 > /proc/sys/net/core/wmem_max
># echo 2500000 > /proc/sys/net/core/rmem_max 
># echo "4096 5000000 5000000" > /proc/sys/net/ipv4/tcp_rmem
># echo "4096 65536 5000000" > /proc/sys/net/ipv4/tcp_wmem
>
>in addition, I've been starting an iperf server for testing with;
>
># iperf -s -m -w1m >> /iperf.server 2>&1 &

Thanks.  I performed all of the above as 'root' on yakov as soon as I
saw your note this morning.  I immediately did and 'ldmadmin watch' to
see if tuning would affect existing rpc.ldmd connections; it did
_NOT_.  Because of this, I restarted the LDM:

ldmadmin restart

After the restart, the latencies started dropping immediately.

>I'll add these to a startup script in the next day or so.

I just added the sequence of 'echos' to /etc/rc.local.  I am not sure
if this is the appropriate place to make the change because of the
following comment in rc.local:

----- /etc/rc.local -----
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.
  ...
----- /etc/rc.local -----

If this means that an autostart of the LDM would proceed the mods, then
rc.local is _not_ the place to make the change.  The reason I say this
is I didn't see the latencies fall in existing LDM feeds from
ensemble.ecmwf.int until I restarted the LDM.  It might be the case
that the tuning steps would best be put into the LDM autostart script
(which does not yet exist on yakov).

It is _very_ interesting to note:

- without the tuning mods, yakov would only receive 2 GB/hr from
  ensemble -- lots of data was being lost.  This occurred even though
  the feed request was split 4 ways (one request each for 10, 20, 30,
  and 60 MB products).

- with the tuning mods AND a restart of the LDM, the latencies
  dropped fairly quickly

Comment: I am not sure why the volume received before tuning
was pegged at 2 GB/hr.  This bears further thought/investigation.

Given the dramatic results, we should consider:

- recommending that Manuel and Waldenio do similar things on their
  TIGGE test machines

- making the same modifications on the idd.unidata.ucar.edu cluster
  data servers (uni1, uni2, and uni4) and on the cluster collector
  frontends oliver and emo

Thanks for the tuning instructions!

Cheers,

Tom

>> From: Tom Yoksas <address@hidden>
>> Subject: 20060104: reboot of yakov leads to high latencies from ECMWF
>> 
>> >From: Unidata User Support <address@hidden>
>> >Organization: Unidata Program Center/UCAR
>> >Keywords: TIGGE IDD latencies FC4 system tuning
>> 
>> Hi Mike,
>> 
>> I don't know if you are reading email, but I rebooted yakov yesterday
>> afternoon because of some desktop weirdness I was seeing AND because a
>> new kernel had been put in /boot but was not yet being used.
>> 
>> After the reboot, the latencies for the data coming from ECMWF went
>> from about 15 seconds to an hour.  I seem to remember that you did some
>> tweeking on yakov after the last reboot, but I can't remember exactly
>> what was needed.  Can you tell me what tuning needs to be done after a
>> reboot of yakov?
>> 
>> Thanks in advance...
>> 
>> Tom
Cheers,

Tom
--
+-----------------------------------------------------------------------------+
* Tom Yoksas                                             UCAR Unidata Program *
* (303) 497-8642 (last resort)                                  P.O. Box 3000 *
* address@hidden                                   Boulder, CO 80307 *
* Unidata WWW Service                             http://www.unidata.ucar.edu/*
+-----------------------------------------------------------------------------+


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.