[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Support #HDQ-517625]: Assistance requested for "Gap in packet sequence" log entries from noaaportIngester



Hi Gregg,

re:
> Hi Tom (sent this to your email address as well since attachments exceed 2
> MB, thus your logging software won't note this per the email size),

User support gets the large emails and files them.  When they get forwarded
to our inquiry tracking system, however, they get dropped when their size
exceeds 2MB.  As long as I know to be on the lookout for an email that
is larger than 2MB because of attachments, I will know to look in the place
where we stash the original emails, so there is typically no need to 
CC them to my personal UCAR email address.

re:
> Did you get the email with the attachment of the splitter picture (i.e with
> two splitters)?  The output of the 8-port splitter, port 8 becomes the
> input for the 4-port splitter.

Yes, and I could see nothing obviously wrong with the setup.  As I have said
in other posts, the fact that there are problems with only one of the ports
being read strongly suggests that most things are working correctly.

re:
> Jay rebooted the server a little while ago so LDM was restarted.

OK, thanks.  I was going to ask when the last time the machine on which
the LDM is running was rebooted.

What is the OS of machine that you have been providing information
for?  I reread the exchanges on the noaaport email list, and I noted
that you mentioned that you are running both RHEL 7 and 6:

        "We are replacing legacy SBN ingest software and spinning up the Unidata
        noaaportIngester (i.e. LDM version 6.13.11) on RHEL7 / RHEL6."

What I did not see (although I may have overlooked it) was a mention of which
OS is running on the machine we have been exchanging information on.

re:
> Attached
> is the contents of the tcpdump and immediately below is the output of the
> command:
> 
> gregg@sbn1:  sudo tcpdump -i *em2* -n -v port 1208 >
> tcpdump_output_port_1208.txt
> 
> tcpdump: listening on em2, link-type EN10MB (Ethernet), capture size 262144 
> bytes

OK.  The capture size is plenty big enough for the UDP traffic from the Novra
S300N.

re:
> ^C14830 packets captured
> 14979 packets received by filter
> 0 packets dropped by kernel
> 
> gregg@sbn1:
> 
> gregg@sbn1: ifconfig
> 
> em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
> inet 140.90.173.123  netmask 255.255.255.0  broadcast 140.90.173.255
> ether 84:2b:2b:4e:0d:0f  txqueuelen 1000  (Ethernet)
> RX packets 252930  bytes 340667211 (324.8 MiB)
> RX errors 0  dropped 379  overruns 0  frame 0
> TX packets 8498  bytes 4022061 (3.8 MiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> em2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
> inet 10.0.5.50  netmask 255.255.0.0  broadcast 10.0.255.255
> ether 84:2b:2b:4e:0d:10  txqueuelen 1000  (Ethernet)
> RX packets 4509552  bytes 6194276937 (5.7 GiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 50  bytes 6171 (6.0 KiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
> inet 127.0.0.1  netmask 255.0.0.0
> loop  txqueuelen 1000  (Local Loopback)
> RX packets 854  bytes 56910 (55.5 KiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 854  bytes 56910 (55.5 KiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

OK, thanks.  Since you have said that the Novra S300N is directly connected
to the Ethernet interface 'em2', I am a bit surprised by the netmask
and broadcast values shown for 'em2'.  Given the direct connection,
I would not expect this to have _any_ bearing on the problem at hand,
but I am prompted to ask for the following output from your Novra:

cmcs: show lan

re:
> During the tcpdump capture, the following entries were logged in the
> 8.exp.log file:
> 
> [ldmcp@sbn1 ~/logs]$ tail -f 8.exp.log
> 
> ...
> 
> 20200623T170127.020879Z noaaportIngester[3443] productMaker.c:pmStart:439     
>      WARN  Gap in packet sequence: 51848447 to 51849619 [skipped 1171]
> 20200623T170127.278180Z noaaportIngester[3443] productMaker.c:pmStart:439     
>      WARN  Gap in packet sequence: 51849619 to 51849717 [skipped 97]
> 20200623T170128.305480Z noaaportIngester[3443] productMaker.c:pmStart:439     
>      WARN  Gap in packet sequence: 51849717 to 51850020 [skipped 302]
> 20200623T170130.768017Z noaaportIngester[3443] productMaker.c:pmStart:439     
>      WARN  Gap in packet sequence: 51850020 to 51850807 [skipped 786]
> 20200623T170134.303885Z noaaportIngester[3443] productMaker.c:pmStart:439     
>      WARN  Gap in packet sequence: 51850807 to 51851889 [skipped 1081]
> 20200623T170134.537434Z noaaportIngester[3443] productMaker.c:pmStart:439     
>      WARN  Gap in packet sequence: 51851889 to 51851921 [skipped 31]
> 20200623T170134.987324Z noaaportIngester[3443] productMaker.c:pmStart:439     
>      WARN  Gap in packet sequence: 51851921 to 51852072 [skipped 150]
> 20200623T170153.274978Z noaaportIngester[3443] productMaker.c:pmStart:439     
>      WARN  Gap in packet sequence: 51852072 to 51857758 [skipped 5685]
> 20200623T170155.285758Z noaaportIngester[3443] productMaker.c:pmStart:439     
>      WARN  Gap in packet sequence: 51857758 to 51858397 [skipped 638]
> 20200623T170202.152546Z noaaportIngester[3443] productMaker.c:pmStart:439     
>      WARN  Gap in packet sequence: 51858397 to 51860565 [skipped 2167]
> 20200623T170202.615410Z noaaportIngester[3443] productMaker.c:pmStart:439     
>      WARN  Gap in packet sequence: 51860565 to 51860695 [skipped 129]
> 20200623T170203.343016Z noaaportIngester[3443] productMaker.c:pmStart:439     
>      WARN  Gap in packet sequence: 51860695 to 51860954 [skipped 258]
> 
> ^C

This is consistent with the other log files/log file snippets that you have
sent previously.  The fact that every entry is a Gap messages is telling
us something, but I can't for the life of me figure out what yet!

re:
> Current contents of ldmd.conf for executing noaaportIngester:
> 
> grep noaaport ldmd.conf | grep ^EXEC
> 
> EXEC    "noaaportIngester -I 10.0.5.50 -m 224.0.1.1  -n -s NMC -l 
> /home/ldmcp/logs/1.nmc.log"
> EXEC    "noaaportIngester -I 10.0.5.50 -m 224.0.1.2  -n -s GOES -f -l 
> /home/ldmcp/logs/2.goes.log"
> EXEC    "noaaportIngester -I 10.0.5.50 -m 224.0.1.3  -n -s NMC2 -l 
> /home/ldmcp/logs/3.nmc2.log"
> EXEC    "noaaportIngester -I 10.0.5.50 -m 224.0.1.4  -n -s NOAAPORT_OPT -l 
> /home/ldmcp/logs/4.nopt.log"
> EXEC "noaaportIngester -I 10.0.5.50 -m 224.0.1.5  -n -s NMC3 -l 
> /home/ldmcp/logs/5.nmc3.log"
> EXEC "noaaportIngester -I 10.0.5.50 -m 224.0.1.8  -n -s EXP -l 
> /home/ldmcp/logs/8.exp.log"
> EXEC "noaaportIngester -I 10.0.5.50 -m 224.0.1.9  -n -s GRW -l 
> /home/ldmcp/logs/9.grw.log"
> EXEC "noaaportIngester -I 10.0.5.50 -m 224.0.1.10 -n -s GRE -l 
> /home/ldmcp/logs/10.gre.log"

OK.  These look OK to me.

Out of curiosity, what is the character(s) between 'EXEC' and 
'"noaaportIngester'
This shouldn't make a difference, but the fact that the columns don't align
has been bugging me since message #1.  Given my anal nature, I always separate
'EXEC' from the opening quote mark with a tab.

re:
> I didn't see any entries in the tcpdump for 48 when no data is received.
> Should I have run the tcpdump longer?

It took me awhile to see a lull long enough for a "hearbeat" packet to
be seen.  I don't think that it is important that you didn't see one in
the time you were looking...

re:
> We have tried multiple NOVRAs on this same server (hostname sbn1).
> Each time Gap errors were seen right away.  One time we briefly tried the
> NOVRA box feeding the "operational" AWIPS system (i.e. cpsbn1 server)
> and Gap errors were seen right away, HOWEVER the often Gap errors were
> NOT seen in the AWIPS noaaportIngester log files.

I saw this comment in a previous email (to noaaport?).  It supports the
firm notion that the problem is somewhere in the machine running the
LDM and not the satellite dish, signal splitters, coax cabling, or
Ethernet cable connecting the Novra S300N to the computer.

re:
> Not sure why the PID was removed/changed, if it was done it was likely
> accidental.  I did check the pids again after reading your email and
> they are all correct now.

OK.

re:
> Appreciate all of your help on this.

This one has me flummoxed!  It makes _NO_ sense to me that 1 of 8 channels
would be having problems when the others are working OK.

So, the things I am interested in seeing now are:

- what is the OS on the machine running the problematic ingest?

- what is the result of the 'ethtool -k em2' invocation I sent
  previously?

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: HDQ-517625
Department: Support NOAAPORT
Priority: Normal
Status: Open
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.