[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[Support #HDQ-517625]: Assistance requested for "Gap in packet sequence" log entries from noaaportIngester
- Subject: [Support #HDQ-517625]: Assistance requested for "Gap in packet sequence" log entries from noaaportIngester
- Date: Wed, 17 Jun 2020 19:17:13 -0600
Hi Gregg,
I waited to send you a response to the email you CCed to Steve
and me until users running NOAAPort ingest systems had a chance
to share their knowledge/experience on the ldm-users and noaaport
email lists. Quite frankly, our experience has been that lively
discussions on the lists tend to get quiet after we respond, and
we do not want to inhibit information sharing.
Now some comments that arise from some of the comments made
on the list(s):
1) we use slightly different settings in the /etc/sysctl.conf
files on our NOAAPort ingest machines
Here are our settings:
#
# Enable DVBS reception
#
net.ipv4.conf.default.rp_filter = 2
#
# DVBS multicast fragment reassembly
#
net.ipv4.ipfrag_max_dist = 0
2) we are also setting other parameters in /etc/sysctl.conf
for FISMA compliance:
#
# FISMA testing
#
kernel.exec-shield = 1
kernel.randomize_va_space = 2
#net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv6.conf.default.accept_redirects = 0
net.ipv4.icmp_ignore_bogus_error_responses = 1
3) we do not believe that there is anything in the beta LDM
that will have any impact on your NOAAPort ingestion quality
I say this because we are running both LDM-6.13.11 and
LDM-6.13.12.43 and we are seeing essentially the same
ingest performance (which, by the way has been spectacularly
good during the mandatory work from home/lockdown period) on
all of our ingest machines.net.ipv4.ipfrag_max_dist
4) we too split the signal from our single, NWS standard NOAAPort
3.8m satellite dish using a ChannelMaster 8-port splitter
We also use a signal/line amplifier since our quad shielded RG-6
cable from the UCAR dish is over 100' long. The info for the line
amplifier we are using is:
Norsat LA 30
950-2150 MHz
75 homs
Made in Canada
We are _not_ using a Novra S300N receiver to power the Norsat 3000
on our dish. Our LNB is powered by a power block that has two
inputs, so power continues to be supplied if one of the inputs dies
for any reason. The info that I have for this unit is:
Dual Power Tee
Model: DPT-V1-F-F-B-B
Made in Canada
5) after using a cron-initiated job that ran the Novra utility 'cmcs'
to get status information from our two Novra S300Ns for several
years, I recently switched to using an applicatin written by Stonie
Cooper of Planetary Data, Inc
Stonie/PDI makes this routine freely available on the PDI website.
I modified the source code for the routine he wrote (named novramonitor)
to add a 'unidata' option that will output pretty much all of the
status information in a single line. This makes parsing the file
for listing and plotting much easier for me, at least.
The difference in the two approaches (using 'cmcs' and PDI's
novramonitor) is that Stonie's code cracks the status packets
that are sent by the Novra S300N, so it there is no need to
have the S300N interrupt its processing to reply to an inquiry.
Quite some time ago, Stonie and I "chatted" (email) discussed
the problems we had observed when using 'cmcs' to get status
information - interacting with the Novra can actually cause
it to miss data, not a LOT, but we had both observed that this
was happening. Another very useful feature of Stonie's
'novramonitor' routine is that it can list out all of the
status information that is being sent as UDP from the Novra
S300N.
Another difference is Stonie's 'novramonitor' continues to run
once started (unless told otherwise), so it can run as a daemon.
I chose to kick off 'novramonitor' using a BASH script that is
run from cron each minute. The first time the BASH script is run,
it will start 'novramonitor' and tell it to write log messages
every 30 seconds. High-rate status information is important when
trying to diagnose ingest problems that are indicated by 'Gap'
messages!
6) recently, we have interacted with folks both at Raytheon and at
NOAA/GSL on their NOAAPort ingest problems
I mention both in the same bullet even though the causes of their
problems were completely different:
- the problem that Raytheon was having was being caused by running
NOAAPort ingesters in VMs run on systems being controlled by
RedHat's hypervisor
It turns out that the RedHat Hypervisor was changing the ordering
of bytes in the UDP stream from their Novra S300N, and no tweaking
of the net.ipv4.ipfrag_max_dist value would result in numbers of
'Gap' messages that would match those seen on a physical machine getting
its feed from the same Novra S300N. The solution to the Raytheon
problem was RedHat fixing their hypervisor (!).
The NOAA/GSL problem, which was resulting in over 1.1M 'Gap' messages
a day on two brand new, very powerful machines that were doing nothing
but NOAAPort ingest, was found to be caused by the data path that the
UDP stream was taking from the Novra S300N to the ingest machines. When
the router through which the UDP stream was changed to a single-purpose
switch, the 'Gap' messages being logged dropped from 1.1-1.4M/day down
to a handful a day on most days and zero on quite a few days. The
lesson learned was that paying attention to the data path that the
UDP stream from the Novra S300N goes through is critically important.
Quick comment: in troubleshooting the NOAA/GSL data path problem, we
observed that there was a characteristic that jumped out at us: there
was typically 1 missed frame in each 'Gap' message. I note this because
the number of missed frames in the 'Gap' messages you included in your
first email did _NOT_ match this pattern. The 'Gap' messages that you
reported showed large numbers of missed frames, and this is a telltale
sign of a noisy feed or malfunctioning S300N receiver.
7) I agree with the gist of a comment that Gilbert made but want to urge
caution in interpreting what he said
It is true (!) that Carrier to Noise (C/N) is _everything_ in NOAAPort
ingest. It is not necessarily true that C/N values in the 15s is an
indication of poor ingest quality. A number of years ago, I worked with
folks at Northrup-Grumman on their NOAAPort ingest setup, and I observed
that they got very good data ingest with a C/N of only 11.7. Not only that,
their coax cable run from their dish to their receiver was something like
435'. Serious investigation revealed that not all Novra S300Ns are
created equal - some are great, and some are not.
Signal strength: Gilberts comment about signal strength is also a bit
misleading (referencing College of DuPage's -37 dBm). Since we amplify
our signal, we can set the signal strength to whatever we want. We
found that a value in the mid-30s was best in our setup, and we also
found that driving the LNB too hard by increasing the signal strength
to something like -21 dBm was very bad. Whether or not the -51 dBm
values you are reporting are a problem is TBD; there is not enough
information to say.
8) we setup our LDM configuration file EXECs of 'noaaportIngester'
differently than you
In particular, we setup multicast routing on the Ethernet interface
which has the connection from the Novra S300N. This makes the
'noaaportIngester' EXECs a bit simpler.
We also moved away from using the system logging daemon for logging
LDM messages. Instead, we are using the logging package that is
included in newer LDM releases like v6.13.11.
For reference, here are the EXEC lines from the LDM configuration
files on our two primary NOAAPort ingest machines:
# 20170313 - changed set of noaaportIngester instances to match:
#
http://www.nws.noaa.gov/noaaport/document/Multicast%20Addresses%201.0.pdf
# CHANNEL PID MULTICAST ADDRESS Port DETAILS
# NMC 101 224.0.1.1 1201 NCEP / NWSTG
# GOES 102 224.0.1.2 1202 GOES / NESDIS
# NMC2 103 224.0.1.3 1203 NCEP / NWSTG2
# NOPT 104 224.0.1.4 1204 Optional Data - OCONUS Imagery
/ Model
# NPP 105 224.0.1.5 1205 National Polar-Orbiting
Partnership / POLARSAT
# EXP 106 224.0.1.8 1208 Experimental
# GRW 107 224.0.1.9 1209 GOES-R Series West
# GRE 108 224.0.1.10 1210 GOES-R Series East
# NWWS 201 224.1.1.1 1201 Weather Wire
#
exec "keep_running noaaportIngester -n -m 224.0.1.1 -l /data/tmp/nwstg.log"
exec "keep_running noaaportIngester -n -m 224.0.1.2 -l /data/tmp/goes.log"
exec "keep_running noaaportIngester -n -m 224.0.1.3 -l /data/tmp/nwstg2.log"
exec "keep_running noaaportIngester -n -m 224.0.1.4 -l /data/tmp/oconus.log"
exec "keep_running noaaportIngester -n -m 224.0.1.5 -l /data/tmp/nother.log"
exec "keep_running noaaportIngester -n -m 224.0.1.8 -l /data/tmp/nother.log"
exec "keep_running noaaportIngester -n -m 224.0.1.9 -l /data/tmp/nother.log"
exec "keep_running noaaportIngester -n -m 224.0.1.10 -l /data/tmp/nother.log"
NB: 'keep_running' is a simple BASH script that runs its first argument from
within a 'while' loop. If the program being run exits, it is automatically
restarted.
If you are interested:
I have made the various routines that I mentioned above available by
FTP from our FTP server:
machine: ftp.unidata.ucar.edu
<user>: anonymous
<pass>: your_email_address
directory: pub/ldm/noaaport
files: cmcs_1.9.10
crontab.leno
gapcount
gapstat
keep_running
novramon.sh
novramonitor.tar.gz
route.cmds
static-routes
Here is a short description of the various files:
cmcs_1.9.10 - the most recent Novra CMCS executable (Linux)
novramon.sh - BASH script run from cron to run our version of
Stonie's 'novramonitor'
novramonitor.tar.gz - source code for our version of Stonie's 'novramonitor'
crontab.leno - has the cron entries we are using to run 'novramon.sh'
gapstat - BASH script that we run every hour and once per day
to extract 'Gap' messages out of NOAAPort ingest log
files and write the result to a log file in the same
directory where the LDM log file is written
gapcount - reads the file created by 'gapstat' and then lists
out summary 'Gap' stats
keep_running - BASH script that is run from an LDM EXEC line and
starts/restarts whatever program that is named in
its first passed parameter (e.g., noaaportIngster)
route.cmds - this was written to help the NOAA/GSL contact adjust
multicast routing on his RedHat 7 ingest machines
static-routes - this is an example of the file we use on our CentOS
6.x NOAAPort ingest machines. The file is put in
the /etc/sysconfig directory, and its entries need
to match the local setup (e.g., Etherent interface,
IP address, etc.)
OK, the above was a LOT of information some of which may or may not be
useful for your situation. I am happy to work with you on troubleshooting
your setup by doing Google Hangouts (now known as Meets) while you are
logged onto your NOAAPort ingest machine.
I hope that the above was reasonably coherent. If it was not, please let
me know and I will try to clarify the things that need clarification.
One last thing that I want to throw in. Here are the most recent 7
days of summary 'Gap' stats that are produced by the 'gapstat' and
'gapcount' scripts that I mentioned above. The sites that I have
summary 'Gap' information for are:
UCAR/Unidata - the machines 'uni14' and 'leno'
LSU/SRCC - the machine 'mistral.srcc.lsu.edu'
UW/SSEC - the machines 'np1.ssec.wisc.edu' and 'np1.ssec.wisc.edu'
NOAA/GSL - the three machines 'awips-ldmcp1.gsl.noaa.gov',
'awips-ldmcp2.gsl.noaa.gov' and cpsbn1.gsl.noaa.gov'
mistral.srcc.lsu.edu
mistral:: 20200611.232102: nGap: 11 nFrame: 28 nG1sec: 2
nG5sec: 3 nG15sec: 4 nG1min: 5
mistral:: 20200612.201102: nGap: 168 nFrame: 624 nG1sec: 152
nG5sec: 152 nG15sec: 152 nG1min: 152
mistral:: 20200613.221702: nGap: 95 nFrame: 2243 nG1sec: 84
nG5sec: 85 nG15sec: 86 nG1min: 86
mistral:: 20200614.212302: nGap: 34 nFrame: 5457 nG1sec: 18
nG5sec: 19 nG15sec: 20 nG1min: 20
mistral:: 20200615.224502: nGap: 1855 nFrame: 18848 nG1sec: 1536
nG5sec: 1548 nG15sec: 1680 nG1min: 1789
mistral:: 20200616.215321: nGap: 2147 nFrame: 11810 nG1sec: 1887
nG5sec: 1900 nG15sec: 2005 nG1min: 2096
mistral:: 20200617.231002: nGap: 7 nFrame: 92 nG1sec: 1
nG5sec: 1 nG15sec: 1 nG1min: 1
np1.ssec.wisc.edu
np1:: 20200611.235908: nGap: 672 nFrame: 3975 nG1sec: 306
nG5sec: 316 nG15sec: 324 nG1min: 339
np1:: 20200612.235511: nGap: 2017 nFrame: 12247 nG1sec: 1617
nG5sec: 1632 nG15sec: 1638 nG1min: 1659
np1:: 20200613.235543: nGap: 1310 nFrame: 14039 nG1sec: 940
nG5sec: 955 nG15sec: 962 nG1min: 974
np1:: 20200614.235955: nGap: 653 nFrame: 6504 nG1sec: 285
nG5sec: 298 nG15sec: 304 nG1min: 311
np1:: 20200615.235610: nGap: 1347 nFrame: 15873 nG1sec: 977
nG5sec: 988 nG15sec: 993 nG1min: 1008
np1:: 20200616.235608: nGap: 722 nFrame: 3883 nG1sec: 315
nG5sec: 329 nG15sec: 334 nG1min: 351
np1:: 20200617.235830: nGap: 513 nFrame: 3146 nG1sec: 103
nG5sec: 111 nG15sec: 116 nG1min: 124
np2.ssec.wisc.edu
np2:: 20200611.235642: nGap: 648 nFrame: 3924 nG1sec: 294
nG5sec: 302 nG15sec: 305 nG1min: 322
np2:: 20200612.233928: nGap: 1883 nFrame: 11323 nG1sec: 1571
nG5sec: 1586 nG15sec: 1590 nG1min: 1604
np2:: 20200613.235905: nGap: 1273 nFrame: 13938 nG1sec: 926
nG5sec: 943 nG15sec: 954 nG1min: 966
np2:: 20200614.235919: nGap: 660 nFrame: 6712 nG1sec: 293
nG5sec: 303 nG15sec: 308 nG1min: 320
np2:: 20200615.235902: nGap: 1296 nFrame: 15941 nG1sec: 953
nG5sec: 969 nG15sec: 976 nG1min: 992
np2:: 20200616.235447: nGap: 724 nFrame: 3615 nG1sec: 345
nG5sec: 362 nG15sec: 367 nG1min: 384
np2:: 20200617.235921: nGap: 449 nFrame: 2726 nG1sec: 79
nG5sec: 91 nG15sec: 96 nG1min: 114
leno.unidata.ucar.edu
leno:: 20200611.005016: nGap: 73 nFrame: 270 nG1sec: 72
nG5sec: 73 nG15sec: 73 nG1min: 73
leno:: 20200612.145106: nGap: 53 nFrame: 230 nG1sec: 51
nG5sec: 53 nG15sec: 53 nG1min: 53
leno:: 20200613.220859: nGap: 1 nFrame: 2 nG1sec: 1
nG5sec: 1 nG15sec: 1 nG1min: 1
leno:: 20200614.000000: nGap: 0 nFrame: 0 nG1sec: 0
nG5sec: 0 nG15sec: 0 nG1min: 0
leno:: 20200615.205200: nGap: 68 nFrame: 357 nG1sec: 65
nG5sec: 65 nG15sec: 66 nG1min: 66
leno:: 20200616.142553: nGap: 13 nFrame: 13 nG1sec: 6
nG5sec: 10 nG15sec: 11 nG1min: 12
leno:: 20200617.185137: nGap: 6 nFrame: 15697 nG1sec: 4
nG5sec: 5 nG15sec: 6 nG1min: 6
uni14.unidata.ucar.edu
uni14:: 20200611.005016: nGap: 80 nFrame: 301 nG1sec: 79
nG5sec: 80 nG15sec: 80 nG1min: 80
uni14:: 20200612.145106: nGap: 63 nFrame: 235 nG1sec: 61
nG5sec: 63 nG15sec: 63 nG1min: 63
uni14:: 20200613.220859: nGap: 1 nFrame: 2 nG1sec: 1
nG5sec: 1 nG15sec: 1 nG1min: 1
uni14:: 20200614.000000: nGap: 0 nFrame: 0 nG1sec: 0
nG5sec: 0 nG15sec: 0 nG1min: 0
uni14:: 20200615.205200: nGap: 76 nFrame: 383 nG1sec: 74
nG5sec: 74 nG15sec: 75 nG1min: 75
uni14:: 20200616.142553: nGap: 13 nFrame: 13 nG1sec: 6
nG5sec: 10 nG15sec: 11 nG1min: 12
uni14:: 20200617.104202: nGap: 1 nFrame: 1 nG1sec: 1
nG5sec: 1 nG15sec: 1 nG1min: 1
awips-ldmcp1.gsd.experimental.gov
awips-ldmcp1:: 20200611.021603: nGap: 1 nFrame: 15 nG1sec: 1
nG5sec: 1 nG15sec: 1 nG1min: 1
awips-ldmcp1:: 20200612.133435: nGap: 4 nFrame: 7781 nG1sec: 3
nG5sec: 4 nG15sec: 4 nG1min: 4
awips-ldmcp1:: 20200613.020138: nGap: 1 nFrame: 5 nG1sec: 1
nG5sec: 1 nG15sec: 1 nG1min: 1
awips-ldmcp1:: 20200614.000000: nGap: 0 nFrame: 0 nG1sec: 0
nG5sec: 0 nG15sec: 0 nG1min: 0
awips-ldmcp1:: 20200615.194102: nGap: 31 nFrame: 11482 nG1sec: 25
nG5sec: 25 nG15sec: 26 nG1min: 27
awips-ldmcp1:: 20200616.142553: nGap: 14 nFrame: 18 nG1sec: 6
nG5sec: 10 nG15sec: 11 nG1min: 12
awips-ldmcp1:: 20200617.151428: nGap: 2 nFrame: 15 nG1sec: 1
nG5sec: 1 nG15sec: 1 nG1min: 1
awips-ldmcp2.gsd.experimental.gov
awips-ldmcp2:: 20200611.000000: nGap: 0 nFrame: 0 nG1sec: 0
nG5sec: 0 nG15sec: 0 nG1min: 0
awips-ldmcp2:: 20200612.133435: nGap: 6 nFrame: 8717 nG1sec: 3
nG5sec: 4 nG15sec: 5 nG1min: 5
awips-ldmcp2:: 20200613.000000: nGap: 0 nFrame: 0 nG1sec: 0
nG5sec: 0 nG15sec: 0 nG1min: 0
awips-ldmcp2:: 20200614.171627: nGap: 2 nFrame: 13 nG1sec: 1
nG5sec: 1 nG15sec: 1 nG1min: 1
awips-ldmcp2:: 20200615.194102: nGap: 27 nFrame: 59 nG1sec: 23
nG5sec: 23 nG15sec: 24 nG1min: 25
awips-ldmcp2:: 20200616.171713: nGap: 18 nFrame: 10457 nG1sec: 9
nG5sec: 13 nG15sec: 14 nG1min: 15
awips-ldmcp2:: 20200617.000000: nGap: 0 nFrame: 0 nG1sec: 0
nG5sec: 0 nG15sec: 0 nG1min: 0
cpsbn1-a2d7.gsd.esrl.noaa.gov
cpsbn1-a2d7:: 20200611.231800: nGap: 64 nFrame: 64 nG1sec: 32
nG5sec: 33 nG15sec: 33 nG1min: 33
cpsbn1-a2d7:: 20200612.212237: nGap: 62 nFrame: 62 nG1sec: 32
nG5sec: 32 nG15sec: 32 nG1min: 33
cpsbn1-a2d7:: 20200613.233843: nGap: 56 nFrame: 56 nG1sec: 27
nG5sec: 29 nG15sec: 29 nG1min: 29
cpsbn1-a2d7:: 20200614.235914: nGap: 60 nFrame: 60 nG1sec: 31
nG5sec: 33 nG15sec: 33 nG1min: 33
cpsbn1-a2d7:: 20200615.223038: nGap: 77 nFrame: 108 nG1sec: 47
nG5sec: 47 nG15sec: 48 nG1min: 51
cpsbn1-a2d7:: 20200616.234728: nGap: 100 nFrame: 100 nG1sec: 52
nG5sec: 60 nG15sec: 61 nG1min: 62
cpsbn1-a2d7:: 20200617.233702: nGap: 82 nFrame: 82 nG1sec: 42
nG5sec: 43 nG15sec: 43 nG1min: 43
Explanation:
The output values are:
host - the short name of the NOAAPort ingest machine
ccyymmdd.hhmmss - the UTC date and time that the last Gap message for the
day was logged. If there were no Gap messages, the
hhmmss value will be '000000'
nGap - the total number of Gap messages for the UTC day
nFrame - the total number of missed frames for the UTC day
nG1sec - the number of Gap messages that were received one
second or less apart from each other
nG5sec - the number of Gap messages that were received five
second or less apart from each other
nG15sec - the number of Gap messages that were received
fifteen second or less apart from each other
nG1min - the number of Gap messages that were received 1
minute or less apart from each other
Cheers,
Tom
--
****************************************************************************
Unidata User Support UCAR Unidata Program
(303) 497-8642 P.O. Box 3000
address@hidden Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage http://www.unidata.ucar.edu
****************************************************************************
Ticket Details
===================
Ticket ID: HDQ-517625
Department: Support NOAAPORT
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata
inquiry tracking system and then made publicly available through the web. If
you do not want to have your interactions made available in this way, you must
let us know in each email you send to us.