[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Support #HDQ-517625]: Assistance requested for "Gap in packet sequence" log entries from noaaportIngester



Hi Gregg,

re:
> Glad you received the attachment.  What you see listed in the log file/s
> are correct.

OK.  The first thing that jumps out at me in the *.log.1 log files is
the numbers of Gap messages for the non-polarsat feeds are not terrible
while the numbers in the polarsat file ARE terrible.

What do I mean by terrible?

- the number of missed frames in each Gap message is large

  This is in stark comparison to the number of missed frames in the
  Gap messages from the other log files which vary from 1 to 7 with
  the majority being 1, 2 or 3.

- ignoring the polarsat log files for the time being, and concentrating
  on the *.log.1 log files (which go from 0 UTC to about 14:22 UTC

  There were a small number of error "events":

  01:36:12            - from goes.log.1
  01:37:05 - 01:37:06 - from nwstg.log
  01:39:48 - 01:39:53 - from goes.log.1, nwstg2.log.1
  06:23:58            - from goes.log.1, nwstg2.log.1 and nwstg.log.1

  When Gap messages are clustered in time and independent of the
  channel (PID), it indicates that there was a source of noise
  causing the problem.  "Noise" is better referred to as Terrestrial
  Interference (TI).

  The number of Gap messages and associated missed frames that were
  reported in the *.log.1 files for the 14 hour and 22 minute interval
  represented by the files is really not terrible.  It is not perfect,
  but it is not that bad.

- on to the polarsat log files

  The two log files you forwarded show VERY BAD quality in this
  channel for two reasons:

  - there are a LOT of them
  - the number of missed frames in each Gap is high

It doesn't make any sense to me (at the current moment, at least) that
the quality in one of the channels should/could be significantly worse
than all other channels.  Because of this, my attention is being turned
to trying to figure out if the 'noaaportIngester' invocation for this
channel is the culprit, or, at least, if a change would result in better
ingest quality overall.

re:
> As a FYI:  here are the Gap errors from the SPC operational AWIPS system
> ingesting SBN data from a different NOVRA box (as you can see hardly any
> Gap errors):

Unsaid in this statement, but which I am assuming is that the SPC operational
Novra and "your" Novra are being fed from the same dish.  Please let me know
if this is or is not correct.

> -bash-4.2$ ll *log
> 
> -rw-rw----. 1 ldm  fxalpha    281883 Jun 18 17:57 edexBridge.log
> -rw-rw----. 1 root fxalpha  13748712 Jun 18 17:57 goes_add.log
> -rw-rw----. 1 root fxalpha 831137533 Jun 18 17:57 ldmd.log
> -rw-rw----. 1 root fxalpha 221082126 Jun 18 17:57 nwstg2.log
> -rw-rw----. 1 root fxalpha 402928198 Jun 18 17:57 nwstg.log
> -rw-rw----. 1 root fxalpha  73177811 Jun 18 17:57 oconus.log
> -rw-rw----. 1 root fxalpha   2300083 Jun 18 17:57 polarsat.log
> -rw-r--r--. 1 ldm  fxalpha      1065 Jun 18 15:03 scour.log

Hmm... There are a couple of things that are jumping out at me
in this listing:

- you are using the LDM scour utility to scour the ingest log
  files

  We do NOT use the LDM scouring to maintain our ingest log files.
  We use the shell script 'nplog_rotate' that can be found in the
  ~ldm/bin directory to rotate these log files.  Moreover, since
  one's local setup may differ from our setup, we recommend that
  users copy this script to a different directory that is in the
  PATH of the user running the LDM, and execute the (modified if
  necessary) script from that directory.

  All of our LDM systems have two directories that we use to organize
  useful executables and scripts:

  ~ldm/util
  ~ldm/decoders

  We copy things like 'nplog_rotate' to our ~ldm/util directory
  and adjust our cron entry to run this script.  The reason for
  doing the copy is we don't have to edit the script after each
  new LDM installation and re-modify values inside.

  For reference, here is our crontable entry that is used
  to run the script and rotate the ingest log files:

#
# Rotate NOAAPort ingest logs
#
0 0 * * * util/nplog_rotate 30 > /dev/null 2>&1

  NB: our clock runs in UTC, so this entry runs at 00:00 UTC.

- the other thing that jumps out at me is the existence of the
  edexBridge.log log file

  Does this mean that you are doing your NOAAPort ingest on the
  same machine on which you are running AWIPS/EDEX?

  If yes, a red flag just started waving in front of my eyes
  since this is a non-standard AWIPS use.  Standard use is
  a dedicated machine to do the ingest which feeds to one
  or more downstream machines that are running AWIPS/EDEX or
  some other data decoding.

re:
> -bash-4.2$ grep Gap g*log n*log o*log p*log
> 
> nwstg2.log:Jun 18 06:23:58 cpsbn1-spcn journal: noaaportIngester[19886] WARN: 
> Gap in packet sequence: 100012352 to 100012355 [skipped 2]
> nwstg2.log:Jun 18 06:23:58 cpsbn1-spcn journal: noaaportIngester[19886] WARN: 
> Gap in packet sequence: 100012368 to 100012370 [skipped 1]
> nwstg2.log:Jun 18 13:45:16 cpsbn1-spcn journal: noaaportIngester[19886] WARN: 
> Gap in packet sequence: 119453908 to 119453912 [skipped 3]
> 
> nwstg.log:Jun 18 06:23:28 cpsbn1-spcn journal: noaaportIngester[19884] WARN: 
> Gap in packet sequence: 34463218 to 34463223 [skipped 4]
> nwstg.log:Jun 18 06:23:58 cpsbn1-spcn journal: noaaportIngester[19884] WARN: 
> Gap in packet sequence: 34469382 to 34469387 [skipped 4]
> nwstg.log:Jun 18 06:23:58 cpsbn1-spcn journal: noaaportIngester[19884] WARN: 
> Gap in packet sequence: 34469417 to 34469421 [skipped 3]
> nwstg.log:Jun 18 06:23:58 cpsbn1-spcn journal: noaaportIngester[19884] WARN: 
> Gap in packet sequence: 34469445 to 34469451 [skipped 5]
> 
> polarsat.log:Jun 18 06:23:57 cpsbn1-spcn journal: noaaportIngester[19888] 
> WARN: Gap in packet sequence: 3935518 to 3935525 [skipped 6]
> polarsat.log:Jun 18 06:23:57 cpsbn1-spcn journal: noaaportIngester[19888] 
> WARN: Gap in packet sequence: 3935599 to 3935603 [skipped 3]
> polarsat.log:Jun 18 06:23:58 cpsbn1-spcn journal: noaaportIngester[19888] 
> WARN: Gap in packet sequence: 3935762 to 3935768 [skipped 5]
> polarsat.log:Jun 18 13:42:41 cpsbn1-spcn journal: noaaportIngester[19890] 
> WARN: Gap in packet sequence: 14317950 to 14317954 [skipped 3]

These all look normal in the number of Gap messages being reported and the
number of missed frames being reported in each Gap message.

re:
> I'll work with my coworkers on your suggestion and getting you more
> specifics.

OK, thanks.

Like I said above, my attention is now focused on your 'noaaportIngester'
invocation for the polarsat data since it is THE feed that is having
the BIG problems.  Are you willing to change this invocation?

re:
> Here is the info from ipconfig, where em2 is connected to the NOVRA box:
> 
>   [ldmcp@sbn1 ~/logs]$ ifconfig -a
> 
>   em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>           inet 140.90.173.123  netmask 255.255.255.0  broadcast 140.90.173.255
>           ether 84:2b:2b:4e:0d:0f  txqueuelen 1000  (Ethernet)
>           RX packets 69037645  bytes 93187995648 (86.7 GiB)
>           RX errors 0  dropped 132588  overruns 0  frame 0
>           TX packets 17641981  bytes 3778381561 (3.5 GiB)
>           TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
>   em2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>           inet 10.0.5.50  netmask 255.255.255.0  broadcast 10.0.5.255
>           ether 84:2b:2b:4e:0d:10  txqueuelen 1000  (Ethernet)
>           RX packets 1196657513  bytes 1640531604592 (1.4 TiB)
>           RX errors 0  dropped 0  overruns 0  frame 0
>           TX packets 9862  bytes 1039439 (1015.0 KiB)
>           TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
>   lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
>           inet 127.0.0.1  netmask 255.0.0.0
>           loop  txqueuelen 1000  (Local Loopback)
>           RX packets 934  bytes 61650 (60.2 KiB)
>           RX errors 0  dropped 0  overruns 0  frame 0
>           TX packets 934  bytes 61650 (60.2 KiB)
>           TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

The output for 'em2' looks good.  The lack of RX and TX errors for the
period covered by the RX 1.4 TiB indicates that there is no problem
with the 'em2' Ethernet interface.

The small number of Gap messages for all channels but polarsat indicates
that there is nothing wrong with the Ethernet cable connecting the
Novra S300N to your machine.  It also indicates that your system is
working OK even though your C/N is in the 15s.  This is in alignment
with the comment I made about the Northrup Grumman ingest quality being
good even though their C/N was around 11.7.

Given the above, I think we have narrowed the place to look for
problems down to the 'noaaportIngester' EXEC line in your LDM
configuration file, ~ldm/etc/ldmd.conf:

If I were you, I would try:

- remove the '-c' flag from each 'noaaportIngester' EXEC line

- remove the '-r 1' flag from each noaaportIngester' EXEC line

- strongly consider moving away from using the system logging
  daemon for logging and use the new logging available in
  current versions of the LDM

  Use of the new LDM logging is the default.  You should have had
  to build the LDM specifying to use the system logging daemon
  to keep using the system logging daemon.  If I am correct in this,
  you will need to rebuild your LDM using defaults:

  <as 'ldm' or the user running your LDM>
  cd ~ldm/ldm-6.13.11/src
  make distclean
  ./configure --with-noaaport > configure.log 2>&1
  ldmadmin stop
  make install > makeinstall.log 2>&1
  ldmadmin start

  The 'configure' and 'make install' lines above assume that you
  have 'root' or 'sudo' capability.  If you do not, then you would
  need to instead run:

  ./configure --with-noaaport --disable-root-actions > configure.log 2>&1
  ldmadmin stop
  make install > makeinstall.log 2>&1

  Then:

  <as 'root'>
  cd ~ldm/ldm-6.13.11/src
  make root-actions

  <as 'ldm'>
  ldmadmin start

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: HDQ-517625
Department: Support NOAAPORT
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.