[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Support #HDQ-517625]: Assistance requested for "Gap in packet sequence" log entries from noaaportIngester



Hi Gregg,

re:
> No apologies required and I'm glad you are asking detailed questions.

OK.

re:
> Correct, there are two splitters in line.  The output from the dish goes
> into an amplifier, and from the amplifier there is a connection to an 8-
> port flipper, on one of the ports of the 8 port splitter is a 4-port
> splitter.  7 ports on the 8-port splitter have NOVRA boxes, and 3 ports of
> the 4-port splitter have NOVRA boxes (i.e. one port is free).

So, my next question is if the Novra S300N your machine is connected to
is being fed from one of the ports on the 8-way splitter, or one of the
ports on the 4-way splitter that is connected to one of the ports on
the 8-way splitter?

Given that the number of Gap messages and associated missed frames on
all channels except the one which is being logged to the polarsat log
file are small, it should be the case that the notion of a double split
being important is a red herring.  The setup is, however, odd, so it
does raise questions in my mind!

re:
> I suspect the AWIPS Program Office and Raytheon have a plan on upgrading
> the LDM version, this software is considered baselined.

We know that upgrading the LDM bundled with new AWIPS distributions is
considered by the folks that produce the distributions, but they seem
to be slow to upgrade to new versions based on the thinking "if it
ain't broke, don't fix it".  The funny thing (to me, at least) part
of this is that Raytheon first approached us (Steve, actually) about
adopting a new logging package in order to get around limitations
with using the system logging daemon.  Why they wouldn't push for
the LDM bundled with AWIPS to be updated to a new version of the
LDM is beyond me.  (I am not sure when the new logging subsystem
was included in LDM releases, but I can say that Steve has found and
fixed some bugs in the code in releases more recent that v6.12.14.)

re:
> It does appear EDEX Bridge is running on cpsb1:
> 
> -bash-4.2$ hostname
> 
> cpsbn1-spcn
> 
> -bash-4.2$ ps -ef |grep -i edex
> 
> ldm      19799 19535  0 21:37 pts/0    00:00:00 grep -i edex
> ldm      19883 19880  0 Jun10 ?        00:35:04 edexBridge -vxl 
> /usr/local/ldm/logs/*edex*Bridge.log -s cp1f
> 
> -bash-4.2$

Is cpsbn1-spcn your machine, or some other machine?  If it is your
machine, the question is if EDEX is running.  If EDEX is not running,
why is edexBridge being run?

re:
> The listing of log files was from the AWIPS system, sorry I wasn't more
> clear.  I included it and the Gap errors to show the AWIPS system had
> essentially zero.   I won't reference this system anymore.

I was getting confused since I don't know the setup there :-)

re:
> Back to the server in question and your next question, the following has
> been set since yesterday:
> 
> [ldmcp@sbn1 ~/etc]$ sysctl -ar 'ipfrag_max_dist'
> 
> net.ipv4.ipfrag_max_dist = 0

Excellent!

re:
> I've attached two tar files, logs from the new build of LDM, and logs from
> ~ldmcp/logs directory after starting the newly built LDM with the changed
> noaaportIngester arguments.

Gotem, thanks!

re:
> The following is from ldmd.conf:
> 
> EXEC    "noaaportIngester -I 10.0.5.50 -m 224.0.1.1  -n -u 3 -s NMC"
> EXEC    "noaaportIngester -I 10.0.5.50 -m 224.0.1.2  -n -u 4 -s GOES -f"
> EXEC    "noaaportIngester -I 10.0.5.50 -m 224.0.1.3  -n -u 5 -s NMC2"
> EXEC    "noaaportIngester -I 10.0.5.50 -m 224.0.1.4  -n -u 6 -s NOAAPORT_OPT"
> EXEC "noaaportIngester -I 10.0.5.50 -m 224.0.1.5  -n -u 7 -s NMC3"
> EXEC "noaaportIngester -I 10.0.5.50 -m 224.0.1.6  -n -u 4 -s ADD"
> EXEC "noaaportIngester -I 10.0.5.50 -m 224.0.1.7  -n -u 7 -s ENC"
> EXEC "noaaportIngester -I 10.0.5.50 -m 224.0.1.8  -n -u 7 -s EXP"
> EXEC "noaaportIngester -I 10.0.5.50 -m 224.0.1.9  -n -u 4 -s GRW"
> EXEC "noaaportIngester -I 10.0.5.50 -m 224.0.1.10 -n -u 4 -s GRE"

One comment and one question:

Comment:

There is nothing on multicast addresses 224.0.1.6 or 224.0.1.7,
so running 'noaaportIngester' invocations to look for data on
them is a waste of CPU (which shouldn't be much, but...).  I
recommend commenting those lines out and restarting the LDM.

Question:

Which 'noaaportIngester' invocation is logging to the polarsat.log file?

I think it is:

EXEC "noaaportIngester -I 10.0.5.50 -m 224.0.1.5  -n -u 7 -s NMC3"

but I want to be sure.

re:
> Sorry for the delayed response and I appreciate all of your help.

No worries.

The next thing to try in this investigation is to run 'tcpdump'
on the ingest machine to look at the traffic coming from the
Novra S300N receiver.  Unfortunately, one needs to have 'root'
or 'sudo' privilege to run 'tcpdump', so you will need to get
one of your system administrators involved.

Since the only channel showing persistent and large errors
(meaning lots of Gap messages and large numbers of missed
frames for each Gap message) is the one being logged in the
polarsat.log file, it would be instructive to see what that
traffic looks like on your machine.

Here is what I would asked to be run and logged:

<as 'root'>
tcpdump -i em2 -n -v port 1205

I say logged, since the traffic on port 1205 (which is multicast
address 224.0.1.5) is not that frequent, so it may take some
time to see products being received.

The other thing that I would appreciate seeing the output from
is:

netstat -rn

I want to see how routing is setup on your machine.  Also,
this can be run as any non-'root' user.

Why?

The newest log files you sent show a lot more traffic
on port 1205 than I would expect for most of the time (I am
running the suggested 'tcpdump' invocation on one of our
NOAAPort ingest machines now, and there is not much traffic
being listed).  It could have been the case that you sent a
log file for a time period when there was a number of products
being sent in the channel, but that feels too coincidental to
me at the moment.  Seeing how routing is setup should either
pinpoint a problem or eliminate that from my thinking.

For reference:

Here is the output from the 'tcpdump' command I am recommending run
on one of our Linux (CentOS 6.10) ingest machines for the 10-minute
period from 20:43:00 to 20:53:00 UTC:

[root@leno ~]# tcpdump -i eth1 -n port 1205 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
22:43:04.275070 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:43:34.272202 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:44:04.263184 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:44:34.271386 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:45:04.272061 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:45:34.273460 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:46:04.271132 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:46:34.341305 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:47:04.315110 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:47:34.279231 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:48:04.263486 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:48:34.271459 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:49:04.268545 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:49:34.274241 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:50:04.292562 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:50:34.269693 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:51:04.275040 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:51:34.270878 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:52:04.291461 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
22:52:34.281295 IP 10.0.9.51.55920 > 224.0.1.5.accord-mgc: UDP, length 48
^C
20 packets captured
66 packets received by filter
12 packets dropped by kernel

Also "for the files", here is the 'netstat -rn' output on our system:

[root@leno ~]# netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
128.117.156.0   0.0.0.0         255.255.255.0   U         0 0          0 eth0
192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 eth1
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth1
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth0
224.0.0.0       192.168.1.7     240.0.0.0       UG        0 0          0 eth1
0.0.0.0         128.117.156.251 0.0.0.0         UG        0 0          0 eth0

Also, if there are any files named route-<iface> (e.g., route-em1, route-em2)
in /etc/sysconfig/network-scripts on your ingest machine, can you please send
them as attachments?  Thanks in advance...

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: HDQ-517625
Department: Support NOAAPORT
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.