[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #GUQ-669331]: Both Radar and Satellite feeds stopped downloading abruptly



Hi John,

I logged onto rime this afternoon to do some more poking around.
This is what I observed in the order that I observed them:

1) the first thing I noted was the lack of any products being
   received by the LDM on rime

   I verified this with both 'ldmadmin watch' and 'notifyme -vl- -o 3600'.

2) I verified that the LDM REQUESTs to port 80 on node4.unidata.ucar.edu
   were still active

   I did this on both the rime and node4 sides.

3) I added a REQUEST for the full IDS|DDPLUS feed to the list of REQUESTs
   already in place on rime, and then restarted rime's LDM

   The result of this test was dramatic:  the IDS|DDPLUS products were
   immediately received, and their latencies soon dropped to essentially
   zero and have stayed there:

   
http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?IDS|DDPLUS+rime.ttu.edu

   At the same time, NO products from the other feed REQUESTs have been inserted
   into the LDM queue.  The reason that they have not been put in rime's LDM
   queue is that their latencies all exceed the maximum latency parameter
   specified in the LDM registry on rime.

4) Because IDS|DDPLUS products can be received on rime, and because
   the size of the IDS|DDPLUS products are very small in comparison
   to the size of the products in the other feeds being REQUESTed
   (NEXRAD3, NGRID, NIMAGE and NOTHER), I immediately started to suspect
   that there was some kind of "packet shaping" going on

   The reason for thinking this is receipt of low volume feeds while high
   volume feeds are not received is a "classic" symptom of some sort
   of bandwidth limiting.

5) Because you stated categorically that everyone there says that the
   problem is not in their systems (e.g., Learn, Internet2, and local network),
   I decided to talk a look at the output from 'ifconfig' on rime

   Bingo!  The 'ifconfig' output shows that there is a problem with the
   Ethernet interface that is being used on rime:

ldm@rime:~$ /sbin/ifconfig -a
em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 129.118.105.28  netmask 255.255.255.0  broadcast 129.118.105.255
        ether d0:94:66:63:ea:79  txqueuelen 1000  (Ethernet)
        RX packets 407915454  bytes 611274831415 (569.2 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 213318707  bytes 15095867996 (14.0 GiB)
        TX errors 211896  dropped 0 overruns 0  carrier 211896  collisions 
56797704
        device memory 0x91b00000-91bfffff  

   Note the number of collisions that are being reported in this snapshot.
6) Because of the unexpected number of collisions being reported in 'ifconfig'
   output, I grabbed my system administrator and asked him to take a look

   Mike's comment was:   If the Ethernet interface on rime is Gbps and is 
running
   at Gbps with full duplex, there should be no collisions reported.

7) This prompted us to look through the output from 'dmesg | less'

   Here is the/a smoking gun:

   [   44.904382] igb 0000:01:00.0 em1: igb: em1 NIC Link is Up 100 Mbps Half 
Duplex, Flow Control: RX/TX
   [   44.904389] igb 0000:01:00.0: EEE Disabled: unsupported at half duplex. 
Re-enable using ethtool when at full duplex.

From the above, our conclusion is that one of the following is true:

- the Ethernet interface is running at 100 Mbps half duplex

  This is BAD and should be corrected using 'ethtool' (run as 'root') as soon
  as possible

- there is something wrong with the Ethernet port being used on rime

- there is something wrong with the Ethernet cable that is connecting rime
  to the switch

- there is something wrong with the port on the switch that rime is connected
  to

Since we do not have 'root' access on rime, we can not use 'ethtool' to
reset your Ethernet interface to what we think it can/should be.

Question:

Can you use 'ethtool' to reset the em1 Ethernet interface on rime and let
us know when the job is done?

If this does not correct the problem, can you check the Ethernet cable
that is being used to connect rime to the switch?

If this does not correct the problem, can you connect rime to a different
port on the switch?

If this doesn't work, can you switch to using the em2 Ethernet interface
on rime?





Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: GUQ-669331
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.