[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Support #HDQ-517625]: Assistance requested for "Gap in packet sequence" log entries from noaaportIngester



Hi Gregg,

re:
> Here is some info from the 8.exp.log file and the tcpdump around 17:01:34
> seconds:
> 
> 20200623T170134.303885Z noaaportIngester[3443] productMaker.c:pmStart:439     
>      WARN  Gap in packet sequence: 51850807 to 51851889 [skipped 1081]
> 20200623T170134.537434Z noaaportIngester[3443] productMaker.c:pmStart:439     
>      WARN  Gap in packet sequence: 51851889 to 51851921 [skipped 31]
> 20200623T170134.987324Z noaaportIngester[3443] productMaker.c:pmStart:439     
>      WARN  Gap in packet sequence: 51851921 to 51852072 [skipped 150]

The most striking feature in the log files that include or are solely for
the traffic on port 1208 is that every entry other than the starting up notice
is a Gap message.  Here is the beginning of the most recent 8.exp.log file you
included in the ldm_logs.tar.Z file you sent:

20200619T213340.847455Z noaaportIngester[20939]     
noaaportIngester.c:main:1323        NOTE  Starting up 6.13.11
20200619T213340.847602Z noaaportIngester[20939]     
noaaportIngester.c:main:1324        NOTE  Copyright (C) 2019 University 
Corporation for Atmospheric Research
20200619T213358.532502Z noaaportIngester[20939]     productMaker.c:pmStart:439  
        WARN  Gap in packet sequence: 24554851 to 24555172 [skipped 320]
20200619T213402.041819Z noaaportIngester[20939]     productMaker.c:pmStart:439  
        WARN  Gap in packet sequence: 24555172 to 24555241 [skipped 68]
20200619T213416.262745Z noaaportIngester[20939]     productMaker.c:pmStart:439  
        WARN  Gap in packet sequence: 24555241 to 24555291 [skipped 49]
20200619T213418.099626Z noaaportIngester[20939]     productMaker.c:pmStart:439  
        WARN  Gap in packet sequence: 24555291 to 24555333 [skipped 41]
 ...

This feature is not new, so nothing that has been done recently has affected
this.

The fact that all traffic on port 1208 looks bad, and there are only a few
to no Gap messages in other log files focuses my attention on the port 1208
traffic.

re:
> tcpdump info from around the first entry above, nothing really to note in 
> differences:
> 
> 17:01:34.300057 IP (tos 0x0, ttl 3, id 30160, offset 0, flags [+], proto UDP 
> (17), length 1500)
> 10.0.9.51.35637 > 224.0.1.8.seagull-ais: UDP, bad length 4032 > 1472
> 
> 17:01:34.300486 IP (tos 0x0, ttl 3, id 30161, offset 0, flags [+], proto UDP 
> (17), length 1500)
> 10.0.9.51.35637 > 224.0.1.8.seagull-ais: UDP, bad length 4032 > 1472
> 
> 17:01:34.303549 IP (tos 0x0, ttl 3, id 30165, offset 0, flags [+], proto UDP 
> (17), length 1500)
> 10.0.9.51.35637 > 224.0.1.8.seagull-ais: UDP, bad length 4018 > 1472
> 
> 17:01:34.303996 IP (tos 0x0, ttl 3, id 30166, offset 0, flags [+], proto UDP 
> (17), length 1500)
> 10.0.9.51.35637 > 224.0.1.8.seagull-ais: UDP, bad length 4068 > 1472
> 
> 
> tcpdump info from around the second entry above, NOTE length is much shorter:
> 
> 17:01:34.531323 IP (tos 0x0, ttl 3, id 30293, offset 0, flags [+], proto UDP 
> (17), length 1500)
> 10.0.9.51.35637 > 224.0.1.8.seagull-ais: UDP, bad length 4032 > 1472
> 
> 17:01:34.537355 IP (tos 0x0, ttl 3, id 30298, offset 0, flags [DF], proto UDP 
> (17), length 273)
> 10.0.9.51.35637 > 224.0.1.8.seagull-ais: UDP, *length 245*
> 
> 17:01:34.542440 IP (tos 0x0, ttl 3, id 30303, offset 0, flags [+], proto UDP 
> (17), length 1500)
> 10.0.9.51.35637 > 224.0.1.8.seagull-ais: UDP, bad length 4068 > 1472

The snippit I sent you from a 'tcpdump' invocation on one of our NOAAPort ingest
systems which, by the way is showing no Gap messages, is much the same.  There 
can
be packets that are shorter in length at the end of a sequence of packets that
represent a product.

re:
> tcpdump info from around the second entry above, NOTE length is much shorter:
> 
> 17:01:34.982842 IP (tos 0x0, ttl 3, id 30615, offset 0, flags [+], proto UDP 
> (17), length 1500)
> 10.0.9.51.35637 > 224.0.1.8.seagull-ais: UDP, bad length 4032 > 1472
> 
> 17:01:34.987238 IP (tos 0x0, ttl 3, id 30619, offset 0, flags [DF], proto UDP 
> (17), length 1197)
> 10.0.9.51.35637 > 224.0.1.8.seagull-ais: UDP, *length 1169*
> 
> 17:01:34.987380 IP (tos 0x0, ttl 3, id 30620, offset 0, flags [+], proto UDP 
> (17), length 1500)
> 10.0.9.51.35637 > 224.0.1.8.seagull-ais: UDP, bad length 4068 > 1472
> 
> So it appears in 2 of the 3 Gap errors, tcpdump indicated the length is
> less than normal.

Same comment as above.

You could do the same kind of 'tcpdump' for other ports (e.g., 1201 (NWSTG) and
1203 (NWSTG2) and see the same kind of output.

One of the things that is bothering me is why the 'tcpdump' output from your
system has 'bad length' messages, while we don't see this on our systems.

Example from one of our ingest systesm:

 ...
18:37:40.824195 IP (tos 0x0, ttl 3, id 16970, offset 0, flags [+], proto UDP 
(17), length 1500)
    10.0.9.51.35637 > 224.0.1.8.seagull-ais: UDP, length 4032
18:37:40.824576 IP (tos 0x0, ttl 3, id 16971, offset 0, flags [+], proto UDP 
(17), length 1500)
    10.0.9.51.35637 > 224.0.1.8.seagull-ais: UDP, length 4032
18:37:40.825052 IP (tos 0x0, ttl 3, id 16972, offset 0, flags [+], proto UDP 
(17), length 1500)
    10.0.9.51.35637 > 224.0.1.8.seagull-ais: UDP, length 4032
 ...

I have tried Googling this, but none of the pages I have looked at so
far have been useful.

Some other information gathering things:

- can you do a 'tcpdump' for port 1201 and send a small snippit of its output?

- the same request for port 1203

- what is the output from:

  ethtool -k em2

  NB: 'ethtool' can be run from a non-root account.

For reference, here is the 'ethtool' output for the Ethernet interface
to which one of our Novra S300Ns is attached on the same ingest machine
that I have been sending information from in this and previous emails:

~: ethtool -k eth1
Features for eth1:
Cannot get device generic-receive-offload settings: Operation not permitted
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-unneeded: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
loopback: off [fixed]

The problem with thinking that there might be some sort of NIC
configuration "problem" is that the UDP streams coming through
other ports is good.

I can't remember if I mentioned the following before, so excuse me
if this is a repetition:

Errors in satellite based datastreams are the norm, not the exception.
This is not to say that a LOT of errors is normal/OK, it isn't. 
Experiencing "reasonable" numbers of Gap messages a day in aggregate
from all NOAAPort channels being ingested is OK.  It was only after
lockdown that we started to experience days where we logged no Gap
messages.  I have attributed this improvement in data reception
quality to less vehicular/grounds maintenance activities near the
UCAR NOAAPort dish.  I think that the same thing can be said for
the NOAA/GSL ingestion, but that case is a bit harder to make given
their prevous problem with errors being introduced in the data path
from one of their Novra S300N receiver to the ingest machines that
it was feeding.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: HDQ-517625
Department: Support NOAAPORT
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.