[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IDD #TIG-389071]: iperf server on yakov refusing connections



Hi Jeff,

> Hey folks,
> 
> Just short circuiting things a little bit, since Art won't be in until the
> morning and this all appears to be my fault.. :)

No problem -- I wasn't working late into the night as you were last night!

> The folks at PSC (John Heffner) discovered that our firewall (a Cisco ASA) was
> mangling the sequence numbers of TCP packets that passed through it.
> Unfortunately, it was present in sniffer traces that were right in front of me
> on several previous occasions but I missed it. When there was packet loss, the
> rewritten sequence numbers were playing havoc with the linux SACK
> implementation. We didn't see it between PSU and PSC because there was never
> any packet loss on the segment, but on the longer path between PSU and NCAR,
> the packet loss caused problems.

I'm glad you found the problem

> Anyway, according to Cisco, this is known (and default!) behavior for the
> PIX/ASA OS, since they're "randomizing" the TCP sequence number to prevent
> attackers from hijacking a TCP session. It's a simple thing to turn off off 
> the
> sequence number rewriting, so I did that and now we're seeing a lot of
> improvement on transfer to Art's machines.

I'm glad to see we can squeeze more than 3Mbps through a 1Gbps end-to-end 
connect

> I'm still not sure what's up with iperf to yakov, since that hasn't improved
> very much after the changes, however we seem to be able to get more than
> 100Mb/s to goeswest.

yakov is having other problems which I still haven't figured out.  As I 
mentioned
to Art yesterday, we used yakov for a LDM throughput test to Europe several 
months
back and pulled in 70+Mbps for a week and that put quite a load on the system.  
That 
was when yakov was FC4 (it's now FC5).  Even then, it didn't perform nearly as 
well
as I would expect. I don't know if the problem is the hardware (Dell 670), the 
Intel 
Pro 1000 chipset or what, but we have several of these systems with differing 
OS'es 
loaded and the throughput is poor on all of them.

> On the TCP tuning issue, in my travels I found this site from LBL that was 
> very
> helpful. It touches on some of the issues that were mentioned in some of the
> email that Art forwarded to me, so I thought I'd pass it back along in case
> it's helpful.
> 
> http://www-didc.lbl.gov/TCP-tuning/linux.html
>
> The bits on the 2.6 kernel are particularly interesting, since Redhat is still
> shipping 2.6.9 in RHEL 4. I haven't done any digging to see if they have
> backported any of the listed patches or not.

I'll look at the information closely.  The IDD cluster director and real servers
are currently all FC3 (2.6.11) systems and all seem to work real well

> One other silver lining of this whole event is that we seem to have built some
> steam behind finishing PSU's NLR connection, so that may happen sooner rather
> than later.

That's the best news of all!  We're already sending a lot of IDD traffic on NLR.

> Thanks for the help. We'll be keeping an eye on things to make sure that this
> is truly the fix, but things look good so far..
> 
> -JEff

Glad to help...

mike

Ticket Details
===================
Ticket ID: TIG-389071
Department: Support IDD
Priority: Normal
Status: Closed