[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Support #HDQ-517625]: Assistance requested for "Gap in packet sequence" log entries from noaaportIngester



Hi Gregg,

I waited to send you a response to the email you CCed to Steve
and me until users running NOAAPort ingest systems had a chance
to share their knowledge/experience on the ldm-users and noaaport
email lists.  Quite frankly, our experience has been that lively
discussions on the lists tend to get quiet after we respond, and
we do not want to inhibit information sharing.

Now some comments that arise from some of the comments made
on the list(s):

1) we use slightly different settings in the /etc/sysctl.conf
   files on our NOAAPort ingest machines

   Here are our settings:

#
# Enable DVBS reception
#
net.ipv4.conf.default.rp_filter = 2

#
# DVBS multicast fragment reassembly 
#
net.ipv4.ipfrag_max_dist = 0

2) we are also setting other parameters in /etc/sysctl.conf
   for FISMA compliance:

#
# FISMA testing
#
kernel.exec-shield = 1
kernel.randomize_va_space = 2
#net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv6.conf.default.accept_redirects = 0
net.ipv4.icmp_ignore_bogus_error_responses = 1

3) we do not believe that there is anything in the beta LDM
   that will have any impact on your NOAAPort ingestion quality

   I say this because we are running both LDM-6.13.11 and
   LDM-6.13.12.43 and we are seeing essentially the same
   ingest performance (which, by the way has been spectacularly
   good during the mandatory work from home/lockdown period) on
   all of our ingest machines.net.ipv4.ipfrag_max_dist

4) we too split the signal from our single, NWS standard NOAAPort
   3.8m satellite dish using a ChannelMaster 8-port splitter

   We also use a signal/line amplifier since our quad shielded RG-6
   cable from the UCAR dish is over 100' long. The info for the line
   amplifier we are using is:

   Norsat LA 30
   950-2150 MHz
   75 homs
   Made in Canada

   We are _not_ using a Novra S300N receiver to power the Norsat 3000
   on our dish. Our LNB is powered by a power block that has two
   inputs, so power continues to be supplied if one of the inputs dies
   for any reason.  The info that I have for this unit is:

   Dual Power Tee
   Model: DPT-V1-F-F-B-B
   Made in Canada

5) after using a cron-initiated job that ran the Novra utility 'cmcs'
   to get status information from our two Novra S300Ns for several
   years, I recently switched to using an applicatin written by Stonie
   Cooper of Planetary Data, Inc

   Stonie/PDI makes this routine freely available on the PDI website.
   I modified the source code for the routine he wrote (named novramonitor)
   to add a 'unidata' option that will output pretty much all of the
   status information in a single line.  This makes parsing the file
   for listing and plotting much easier for me, at least.

   The difference in the two approaches (using 'cmcs' and PDI's
   novramonitor) is that Stonie's code cracks the status packets
   that are sent by the Novra S300N, so it there is no need to 
   have the S300N interrupt its processing to reply to an inquiry.
   Quite some time ago, Stonie and I "chatted" (email) discussed
   the problems we had observed when using 'cmcs' to get status
   information - interacting with the Novra can actually cause
   it to miss data, not a LOT, but we had both observed that this
   was happening.  Another very useful feature of Stonie's
   'novramonitor' routine is that it can list out all of the
   status information that is being sent as UDP from the Novra
   S300N.

   Another difference is Stonie's 'novramonitor' continues to run
   once started (unless told otherwise), so it can run as a daemon.
   I chose to kick off 'novramonitor' using a BASH script that is
   run from cron each minute.  The first time the BASH script is run,
   it will start 'novramonitor' and tell it to write log messages
   every 30 seconds.  High-rate status information is important when
   trying to diagnose ingest problems that are indicated by 'Gap'
   messages!

6) recently, we have interacted with folks both at Raytheon and at
   NOAA/GSL on their NOAAPort ingest problems

   I mention both in the same bullet even though the causes of their
   problems were completely different:

   - the problem that Raytheon was having was being caused by running
     NOAAPort ingesters in VMs run on systems being controlled by
     RedHat's hypervisor

     It turns out that the RedHat Hypervisor was changing the ordering
     of bytes in the UDP stream from their Novra S300N, and no tweaking
     of the net.ipv4.ipfrag_max_dist value would result in numbers of
     'Gap' messages that would match those seen on a physical machine getting
     its feed from the same Novra S300N.  The solution to the Raytheon 
     problem was RedHat fixing their hypervisor (!).

     The NOAA/GSL problem, which was resulting in over 1.1M 'Gap' messages
     a day on two brand new, very powerful machines that were doing nothing
     but NOAAPort ingest, was found to be caused by the data path that the
     UDP stream was taking from the Novra S300N to the ingest machines.  When
     the router through which the UDP stream was changed to a single-purpose
     switch, the 'Gap' messages being logged dropped from 1.1-1.4M/day down
     to a handful a day on most days and zero on quite a few days.  The
     lesson learned was that paying attention to the data path that the
     UDP stream from the Novra S300N goes through is critically important.

     Quick comment: in troubleshooting the NOAA/GSL data path problem, we
     observed that there was a characteristic that jumped out at us:  there
     was typically 1 missed frame in each 'Gap' message.  I note this because
     the number of missed frames in the 'Gap' messages you included in your
     first email did _NOT_ match this pattern.  The 'Gap' messages that you
     reported showed large numbers of missed frames, and this is a telltale
     sign of a noisy feed or malfunctioning S300N receiver.

7) I agree with the gist of a comment that Gilbert made but want to urge
   caution in interpreting what he said

   It is true (!) that Carrier to Noise (C/N) is _everything_ in NOAAPort
   ingest.  It is not necessarily true that C/N values in the 15s is an
   indication of poor ingest quality.  A number of years ago, I worked with
   folks at Northrup-Grumman on their NOAAPort ingest setup, and I observed
   that they got very good data ingest with a C/N of only 11.7.  Not only that,
   their coax cable run from their dish to their receiver was something like
   435'.  Serious investigation revealed that not all Novra S300Ns are
   created equal - some are great, and some are not.

   Signal strength: Gilberts comment about signal strength is also a bit
   misleading (referencing College of DuPage's -37 dBm).  Since we amplify
   our signal, we can set the signal strength to whatever we want.  We
   found that a value in the mid-30s was best in our setup, and we also
   found that driving the LNB too hard by increasing the signal strength
   to something like -21 dBm was very bad.  Whether or not the -51 dBm
   values you are reporting are a problem is TBD; there is not enough
   information to say.

8) we setup our LDM configuration file EXECs of 'noaaportIngester'
   differently than you

   In particular, we setup multicast routing on the Ethernet interface
   which has the connection from the Novra S300N.  This makes the
   'noaaportIngester' EXECs a bit simpler.

   We also moved away from using the system logging daemon for logging
   LDM messages.  Instead, we are using the logging package that is
   included in newer LDM releases like v6.13.11.

   For reference, here are the EXEC lines from the LDM configuration
   files on our two primary NOAAPort ingest machines:

# 20170313 - changed set of noaaportIngester instances to match:
#            
http://www.nws.noaa.gov/noaaport/document/Multicast%20Addresses%201.0.pdf
#            CHANNEL PID MULTICAST ADDRESS Port DETAILS
#            NMC     101     224.0.1.1     1201 NCEP / NWSTG
#            GOES    102     224.0.1.2     1202 GOES / NESDIS
#            NMC2    103     224.0.1.3     1203 NCEP / NWSTG2
#            NOPT    104     224.0.1.4     1204 Optional Data - OCONUS Imagery 
/ Model
#            NPP     105     224.0.1.5     1205 National Polar-Orbiting 
Partnership / POLARSAT
#            EXP     106     224.0.1.8     1208 Experimental
#            GRW     107     224.0.1.9     1209 GOES-R Series West
#            GRE     108     224.0.1.10    1210 GOES-R Series East
#            NWWS    201     224.1.1.1     1201 Weather Wire
#
exec    "keep_running noaaportIngester -n -m 224.0.1.1  -l /data/tmp/nwstg.log"
exec    "keep_running noaaportIngester -n -m 224.0.1.2  -l /data/tmp/goes.log"
exec    "keep_running noaaportIngester -n -m 224.0.1.3  -l /data/tmp/nwstg2.log"
exec    "keep_running noaaportIngester -n -m 224.0.1.4  -l /data/tmp/oconus.log"
exec    "keep_running noaaportIngester -n -m 224.0.1.5  -l /data/tmp/nother.log"
exec    "keep_running noaaportIngester -n -m 224.0.1.8  -l /data/tmp/nother.log"
exec    "keep_running noaaportIngester -n -m 224.0.1.9  -l /data/tmp/nother.log"
exec    "keep_running noaaportIngester -n -m 224.0.1.10 -l /data/tmp/nother.log"

   NB: 'keep_running' is a simple BASH script that runs its first argument from
   within a 'while' loop.  If the program being run exits, it is automatically
   restarted.

If you are interested:

I have made the various routines that I mentioned above available by
FTP from our FTP server:

machine:   ftp.unidata.ucar.edu
<user>:    anonymous
<pass>:    your_email_address
directory: pub/ldm/noaaport
files:     cmcs_1.9.10
           crontab.leno
           gapcount
           gapstat
           keep_running
           novramon.sh
           novramonitor.tar.gz
           route.cmds
           static-routes

Here is a short description of the various files:

cmcs_1.9.10          - the most recent Novra CMCS executable (Linux)

novramon.sh          - BASH script run from cron to run our version of
                       Stonie's 'novramonitor'
novramonitor.tar.gz  - source code for our version of Stonie's 'novramonitor'
crontab.leno         - has the cron entries we are using to run 'novramon.sh'

gapstat              - BASH script that we run every hour and once per day
                       to extract 'Gap' messages out of NOAAPort ingest log
                       files and write the result to a log file in the same
                       directory where the LDM log file is written
gapcount             - reads the file created by 'gapstat' and then lists
                       out summary 'Gap' stats
               
keep_running         - BASH script that is run from an LDM EXEC line and
                       starts/restarts whatever program that is named in
                       its first passed parameter (e.g., noaaportIngster)


route.cmds           - this was written to help the NOAA/GSL contact adjust
                       multicast routing on his RedHat 7 ingest machines
static-routes        - this is an example of the file we use on our CentOS
                       6.x NOAAPort ingest machines.  The file is put in
                       the /etc/sysconfig directory, and its entries need
                       to match the local setup (e.g., Etherent interface,
                       IP address, etc.)

OK, the above was a LOT of information some of which may or may not be
useful for your situation.  I am happy to work with you on troubleshooting
your setup by doing Google Hangouts (now known as Meets) while you are
logged onto your NOAAPort ingest machine.

I hope that the above was reasonably coherent.  If it was not, please let
me know and I will try to clarify the things that need clarification.

One last thing that I want to throw in.  Here are the most recent 7
days of summary 'Gap' stats that are produced by the 'gapstat' and
'gapcount' scripts that I mentioned above.  The sites that I have
summary 'Gap' information for are:

UCAR/Unidata - the machines 'uni14' and 'leno'
LSU/SRCC     - the machine 'mistral.srcc.lsu.edu'
UW/SSEC      - the machines 'np1.ssec.wisc.edu' and 'np1.ssec.wisc.edu'
NOAA/GSL     - the three machines 'awips-ldmcp1.gsl.noaa.gov',
               'awips-ldmcp2.gsl.noaa.gov' and cpsbn1.gsl.noaa.gov'

mistral.srcc.lsu.edu
mistral:: 20200611.232102: nGap:     11 nFrame:         28 nG1sec:     2 
nG5sec:     3 nG15sec:     4 nG1min:     5
mistral:: 20200612.201102: nGap:    168 nFrame:        624 nG1sec:   152 
nG5sec:   152 nG15sec:   152 nG1min:   152
mistral:: 20200613.221702: nGap:     95 nFrame:       2243 nG1sec:    84 
nG5sec:    85 nG15sec:    86 nG1min:    86
mistral:: 20200614.212302: nGap:     34 nFrame:       5457 nG1sec:    18 
nG5sec:    19 nG15sec:    20 nG1min:    20
mistral:: 20200615.224502: nGap:   1855 nFrame:      18848 nG1sec:  1536 
nG5sec:  1548 nG15sec:  1680 nG1min:  1789
mistral:: 20200616.215321: nGap:   2147 nFrame:      11810 nG1sec:  1887 
nG5sec:  1900 nG15sec:  2005 nG1min:  2096
mistral:: 20200617.231002: nGap:      7 nFrame:         92 nG1sec:     1 
nG5sec:     1 nG15sec:     1 nG1min:     1
np1.ssec.wisc.edu
    np1:: 20200611.235908: nGap:    672 nFrame:       3975 nG1sec:   306 
nG5sec:   316 nG15sec:   324 nG1min:   339
    np1:: 20200612.235511: nGap:   2017 nFrame:      12247 nG1sec:  1617 
nG5sec:  1632 nG15sec:  1638 nG1min:  1659
    np1:: 20200613.235543: nGap:   1310 nFrame:      14039 nG1sec:   940 
nG5sec:   955 nG15sec:   962 nG1min:   974
    np1:: 20200614.235955: nGap:    653 nFrame:       6504 nG1sec:   285 
nG5sec:   298 nG15sec:   304 nG1min:   311
    np1:: 20200615.235610: nGap:   1347 nFrame:      15873 nG1sec:   977 
nG5sec:   988 nG15sec:   993 nG1min:  1008
    np1:: 20200616.235608: nGap:    722 nFrame:       3883 nG1sec:   315 
nG5sec:   329 nG15sec:   334 nG1min:   351
    np1:: 20200617.235830: nGap:    513 nFrame:       3146 nG1sec:   103 
nG5sec:   111 nG15sec:   116 nG1min:   124
np2.ssec.wisc.edu
    np2:: 20200611.235642: nGap:    648 nFrame:       3924 nG1sec:   294 
nG5sec:   302 nG15sec:   305 nG1min:   322
    np2:: 20200612.233928: nGap:   1883 nFrame:      11323 nG1sec:  1571 
nG5sec:  1586 nG15sec:  1590 nG1min:  1604
    np2:: 20200613.235905: nGap:   1273 nFrame:      13938 nG1sec:   926 
nG5sec:   943 nG15sec:   954 nG1min:   966
    np2:: 20200614.235919: nGap:    660 nFrame:       6712 nG1sec:   293 
nG5sec:   303 nG15sec:   308 nG1min:   320
    np2:: 20200615.235902: nGap:   1296 nFrame:      15941 nG1sec:   953 
nG5sec:   969 nG15sec:   976 nG1min:   992
    np2:: 20200616.235447: nGap:    724 nFrame:       3615 nG1sec:   345 
nG5sec:   362 nG15sec:   367 nG1min:   384
    np2:: 20200617.235921: nGap:    449 nFrame:       2726 nG1sec:    79 
nG5sec:    91 nG15sec:    96 nG1min:   114
leno.unidata.ucar.edu
   leno:: 20200611.005016: nGap:     73 nFrame:        270 nG1sec:    72 
nG5sec:    73 nG15sec:    73 nG1min:    73
   leno:: 20200612.145106: nGap:     53 nFrame:        230 nG1sec:    51 
nG5sec:    53 nG15sec:    53 nG1min:    53
   leno:: 20200613.220859: nGap:      1 nFrame:          2 nG1sec:     1 
nG5sec:     1 nG15sec:     1 nG1min:     1
   leno:: 20200614.000000: nGap:      0 nFrame:          0 nG1sec:     0 
nG5sec:     0 nG15sec:     0 nG1min:     0
   leno:: 20200615.205200: nGap:     68 nFrame:        357 nG1sec:    65 
nG5sec:    65 nG15sec:    66 nG1min:    66
   leno:: 20200616.142553: nGap:     13 nFrame:         13 nG1sec:     6 
nG5sec:    10 nG15sec:    11 nG1min:    12
   leno:: 20200617.185137: nGap:      6 nFrame:      15697 nG1sec:     4 
nG5sec:     5 nG15sec:     6 nG1min:     6
uni14.unidata.ucar.edu
  uni14:: 20200611.005016: nGap:     80 nFrame:        301 nG1sec:    79 
nG5sec:    80 nG15sec:    80 nG1min:    80
  uni14:: 20200612.145106: nGap:     63 nFrame:        235 nG1sec:    61 
nG5sec:    63 nG15sec:    63 nG1min:    63
  uni14:: 20200613.220859: nGap:      1 nFrame:          2 nG1sec:     1 
nG5sec:     1 nG15sec:     1 nG1min:     1
  uni14:: 20200614.000000: nGap:      0 nFrame:          0 nG1sec:     0 
nG5sec:     0 nG15sec:     0 nG1min:     0
  uni14:: 20200615.205200: nGap:     76 nFrame:        383 nG1sec:    74 
nG5sec:    74 nG15sec:    75 nG1min:    75
  uni14:: 20200616.142553: nGap:     13 nFrame:         13 nG1sec:     6 
nG5sec:    10 nG15sec:    11 nG1min:    12
  uni14:: 20200617.104202: nGap:      1 nFrame:          1 nG1sec:     1 
nG5sec:     1 nG15sec:     1 nG1min:     1
awips-ldmcp1.gsd.experimental.gov
awips-ldmcp1:: 20200611.021603: nGap:       1 nFrame:         15 nG1sec:     1 
nG5sec:     1 nG15sec:    1 nG1min:    1
awips-ldmcp1:: 20200612.133435: nGap:       4 nFrame:       7781 nG1sec:     3 
nG5sec:     4 nG15sec:    4 nG1min:    4
awips-ldmcp1:: 20200613.020138: nGap:       1 nFrame:          5 nG1sec:     1 
nG5sec:     1 nG15sec:    1 nG1min:    1
awips-ldmcp1:: 20200614.000000: nGap:       0 nFrame:          0 nG1sec:     0 
nG5sec:     0 nG15sec:    0 nG1min:    0
awips-ldmcp1:: 20200615.194102: nGap:      31 nFrame:      11482 nG1sec:    25 
nG5sec:    25 nG15sec:   26 nG1min:   27
awips-ldmcp1:: 20200616.142553: nGap:      14 nFrame:         18 nG1sec:     6 
nG5sec:    10 nG15sec:   11 nG1min:   12
awips-ldmcp1:: 20200617.151428: nGap:       2 nFrame:         15 nG1sec:     1 
nG5sec:     1 nG15sec:    1 nG1min:    1
awips-ldmcp2.gsd.experimental.gov
awips-ldmcp2:: 20200611.000000: nGap:       0 nFrame:          0 nG1sec:     0 
nG5sec:     0 nG15sec:    0 nG1min:    0
awips-ldmcp2:: 20200612.133435: nGap:       6 nFrame:       8717 nG1sec:     3 
nG5sec:     4 nG15sec:    5 nG1min:    5
awips-ldmcp2:: 20200613.000000: nGap:       0 nFrame:          0 nG1sec:     0 
nG5sec:     0 nG15sec:    0 nG1min:    0
awips-ldmcp2:: 20200614.171627: nGap:       2 nFrame:         13 nG1sec:     1 
nG5sec:     1 nG15sec:    1 nG1min:    1
awips-ldmcp2:: 20200615.194102: nGap:      27 nFrame:         59 nG1sec:    23 
nG5sec:    23 nG15sec:   24 nG1min:   25
awips-ldmcp2:: 20200616.171713: nGap:      18 nFrame:      10457 nG1sec:     9 
nG5sec:    13 nG15sec:   14 nG1min:   15
awips-ldmcp2:: 20200617.000000: nGap:       0 nFrame:          0 nG1sec:     0 
nG5sec:     0 nG15sec:    0 nG1min:    0
cpsbn1-a2d7.gsd.esrl.noaa.gov
cpsbn1-a2d7:: 20200611.231800: nGap:      64 nFrame:         64 nG1sec:   32 
nG5sec:   33 nG15sec:   33 nG1min:   33
cpsbn1-a2d7:: 20200612.212237: nGap:      62 nFrame:         62 nG1sec:   32 
nG5sec:   32 nG15sec:   32 nG1min:   33
cpsbn1-a2d7:: 20200613.233843: nGap:      56 nFrame:         56 nG1sec:   27 
nG5sec:   29 nG15sec:   29 nG1min:   29
cpsbn1-a2d7:: 20200614.235914: nGap:      60 nFrame:         60 nG1sec:   31 
nG5sec:   33 nG15sec:   33 nG1min:   33
cpsbn1-a2d7:: 20200615.223038: nGap:      77 nFrame:        108 nG1sec:   47 
nG5sec:   47 nG15sec:   48 nG1min:   51
cpsbn1-a2d7:: 20200616.234728: nGap:     100 nFrame:        100 nG1sec:   52 
nG5sec:   60 nG15sec:   61 nG1min:   62
cpsbn1-a2d7:: 20200617.233702: nGap:      82 nFrame:         82 nG1sec:   42 
nG5sec:   43 nG15sec:   43 nG1min:   43

Explanation:

The output values are:

  host            - the short name of the NOAAPort ingest machine
  ccyymmdd.hhmmss - the UTC date and time that the last Gap message for the
                    day was logged.  If there were no Gap messages, the
                    hhmmss value will be '000000'
  nGap            - the total number of Gap messages for the UTC day
  nFrame          - the total number of missed frames for the UTC day
  nG1sec          - the number of Gap messages that were received one
                      second or less apart from each other
  nG5sec          - the number of Gap messages that were received five
                      second or less apart from each other
  nG15sec         - the number of Gap messages that were received
                    fifteen second or less apart from each other
  nG1min          - the number of Gap messages that were received 1
                    minute or less apart from each other 

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: HDQ-517625
Department: Support NOAAPORT
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.