[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030618: troubleshooting slow HDS link from seistan to tornado@ulm (cont.)



>From: Robert Leche <address@hidden>
>Organization: LSU
>Keywords: 200306161954.h5GJs2Ld016710 LDM IDD HDS

Bob,

>Soming odd is going on with networking. note the following:
>
>[ldm@seistan ~]$ ping tornado.geos.ulm.edu
>PING tornado.geos.ulm.edu (198.202.242.22) from 130.39.188.204 : 56(84) bytes
>of data.
>64 bytes from tornado.geos.ulm.edu (198.202.242.22): icmp_seq=0 ttl=58 time=13
> .879 msec
>64 bytes from tornado.geos.ulm.edu (198.202.242.22): icmp_seq=1 ttl=58 time=11
> .651 msec
>64 bytes from tornado.geos.ulm.edu (198.202.242.22): icmp_seq=2 ttl=58 time=11
> .629 msec
>64 bytes from tornado.geos.ulm.edu (198.202.242.22): icmp_seq=3 ttl=58 time=11
> .037 msec
>64 bytes from tornado.geos.ulm.edu (198.202.242.22): icmp_seq=4 ttl=58 time=8.
> 452 msec

This look reasonable: 8-13 millisecond traceroutes match the continuous
15 millisecond times we were seeing using mtr (Matt's TraceRoute) from
tornado to seistan.

>--- tornado.geos.ulm.edu ping statistics ---
>6 packets transmitted, 6 packets received, 0% packet loss
>round-trip min/avg/max/mdev = 8.452/11.415/13.879/1.596 ms
>[ldm@seistan ~]$ /usr/sbin/traceroute tornado.geos.ulm.edu
>traceroute to tornado.geos.ulm.edu (198.202.242.22), 30 hops max, 38 byte
>packets
> 1  130.39.188.1 (130.39.188.1)  14.610 ms  1.496 ms  3.754 ms
> 2  lsubr1-118-6509-dsw-1.g1.lsu.edu (130.39.1.20)  58.715 ms  2.253 ms 
>1.747 ms
> 3  laNoc-lsubr.LEARN.la.net (162.75.0.9)  3.745 ms  2.068 ms  3.017 ms
> 4  ulm-laNoc.LEARN.la.net (162.75.0.38)  14.859 ms  14.947 ms  14.650 ms
> 5  * * *
> 6  *
>[ldm@seistan ~]$ ping -s 100 tornado.geos.ulm.edu
>PING tornado.geos.ulm.edu (198.202.242.22) from 130.39.188.204 : 100(128)
>bytes of data.

Step 5 and 6 show some sort of name service failure.

>--- tornado.geos.ulm.edu ping statistics ---
>9 packets transmitted, 0 packets received, 100% packet loss
>[ldm@seistan ~]$ ping -s 50  tornado.geos.ulm.edu
>PING tornado.geos.ulm.edu (198.202.242.22) from 130.39.188.204 : 50(78) bytes
>of data.
>58 bytes from tornado.geos.ulm.edu (198.202.242.22): icmp_seq=0 ttl=58 time=4.
> 920
>msec
>58 bytes from tornado.geos.ulm.edu (198.202.242.22): icmp_seq=1 ttl=58 time=40
> .963
>msec
>58 bytes from tornado.geos.ulm.edu (198.202.242.22): icmp_seq=2 ttl=58 time=7.
> 429
>msec
>58 bytes from tornado.geos.ulm.edu (198.202.242.22): icmp_seq=3 ttl=58 time=5.
> 977
>msec
>58 bytes from tornado.geos.ulm.edu (198.202.242.22): icmp_seq=4 ttl=58 time=5.
> 472
>msec
>58 bytes from tornado.geos.ulm.edu (198.202.242.22): icmp_seq=5 ttl=58 time=17
> .991
>msec
>58 bytes from tornado.geos.ulm.edu (198.202.242.22): icmp_seq=6 ttl=58 time=13
> .234
>msec
>58 bytes from tornado.geos.ulm.edu (198.202.242.22): icmp_seq=7 ttl=58 time=13
> .329
>msec

Right!  Our observation was that there was difficulty in sending LDM products
that were not small.

>--- tornado.geos.ulm.edu ping statistics ---
>8 packets transmitted, 8 packets received, 0% packet loss
>round-trip min/avg/max/mdev = 4.920/13.664/40.963/11.213 ms
>[ldm@seistan ~]$ ldmping tornado.geos.ulm.edu
>Jun 18 22:06:32      State    Elapsed Port   Remote_Host           rpc_stat
>Jun 18 22:06:32 RESPONDING   0.115693  388   tornado.geos.ulm.edu  
>
>[ldm@seistan ~]$ ping -s 100 tornado.geos.ulm.edu
>PING tornado.geos.ulm.edu (198.202.242.22) from 130.39.188.204 : 100(128)
>bytes of data.
>
>--- tornado.geos.ulm.edu ping statistics ---
>7 packets transmitted, 0 packets received, 100% packet loss
> 
>I can ping the host with a 50 byte payload (or less) but can't do large
>packets.  Can't traceroute either. This still could be a firewall issue. 

That is what we were going to try and see by configuring seistan to
allow LDM feeds to Unidata machines.

>Are you aware of a client server type of test software that will perform
>bit error rate analysis? If so ulm and lsu could do some testing.

I will touch base with my system administrator tomorrow to see can be
done to pinpoint exactly what is causing the slowdown/stoppage of large
packets out of LSU or into ULM.  My thinking was that if we could not
get a reliable, fast LDM HDS feed from seistan to one of our machines,
then the problem is local to LSU.  If we could and ULM couldn't, then
the problem is local to ULM.  In order to test feeds from seistan, I
wanted it setup to allow feeds from a variety of sites around the
country:  Northern Virginia (atm.geo.nsf.gov), Tampa Bay Florida
(metlab.cas.usf.edu), Rio de Janeiro (brisa.meteoro.ufrj.br), and
possibly others.  This is why I asked if you would allow all access on
port 388.  Giving us 'root' access to seistan (though 'root' login
or sudo from the 'ldm' account) would allow us to rapidly test a
variety of possibilities and, hopefully, trace down the problem.

Tom