[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[NOAAPORT #ZLV-851048]: LDM misses data on NOAAPORT via satellite



Hi,

re:
> Resending since your server blocked my email because of the attached
> zip file.

It must have been unusually large since we routinely receive emails with
attachments that are > 10 MB.  
re:
> I cannot add the one ldmd.log file because of size.

OK.  No worries, I think that what you have provided below is enough to
zero in on the problem...

re:
> I will paste a general idea of what I am seeing at the end of my previous 
> reply.

re:
> Sorry for the delay. I wanted to give time for the changes I made to see for
> sure if anything changed.

OK, good move!

re:
> I changed the UDP buffer setting up to 25MB for both in /etc/sysctl.conf:
> 
> net.core.rmem_max
> net.core.rmem_default
> 
> In reference to the error connecting to rtstatat.unidata.ucar.edu yes, that 
> was
> a typo and I meant to put rtstat.  I commented that line out of the ldmd.conf
> and the error is now gone.

Very good.

re:
> I added the cron tasks and even though the one computer gives an error message
> it does produce the metrics.txt file.

It issues an error message when attempting to write the metrics.txt file?  I
have never see this before.  Can you send the error message that is isued?

re:
> The other day it was giving the same error message as when the crontab would 
> run.
> Now I am not getting any error message so this seems to have fixed itself
> or the rebooting the boxes after the sysctl.conf cleaned something up that
> got rid of the error.

OK.

re:
> But with all of these changes I am still not seeing any difference. I am
> missing a lot of the NBM text data on the Z820 compared to the slower Z400
> machine.

NOAA has been running a test where the NBM files are being sent on
port 1206, multicast address 224.0.1.6.  A quick review of the LDM
configuration files that you sent for both of your machines showed
that you are not processing data on this port.  For reference, here
are the LDM configuration EXEC lines that we using:

EXEC    "keep_running noaaportIngester -n -m 224.0.1.1  -l /data/tmp/nwstg.log"
EXEC    "keep_running noaaportIngester -n -m 224.0.1.2  -l /data/tmp/goes.log"
EXEC    "keep_running noaaportIngester -n -m 224.0.1.3  -l /data/tmp/nwstg2.log"
EXEC    "keep_running noaaportIngester -n -m 224.0.1.4  -l /data/tmp/oconus.log"
EXEC    "keep_running noaaportIngester -n -m 224.0.1.5  -l 
/data/tmp/polar-orbiter.log"
EXEC    "keep_running noaaportIngester -n -m 224.0.1.6  -l /data/tmp/nbm.log"
EXEC    "keep_running noaaportIngester -n -m 224.0.1.8  -l 
/data/tmp/experimental.log"
EXEC    "keep_running noaaportIngester -n -m 224.0.1.9  -l 
/data/tmp/goes-west.log"
EXEC    "keep_running noaaportIngester -n -m 224.0.1.10 -l 
/data/tmp/goes-east.log"

Comments on our actions:

- 'keep_running' is a simple shell script that will start the process passed
  as the first argument, and restart that process if it exits

  This script is included in the newest LDM releases like v6.13.13.

- we are logging the receipt of every product into channel specific log
  files

  We mine these log files to keep track of the number of Gaps and associated
  missed frames that we are being experienced.

re:
> I am attaching the metrics files from both computers along with the ldmd.log
> from both.  

The metrics.txt files show that both of your machines are basically idling
(i.e., the load averages are very low), so system overload is clearly not
an issue.

re:
> I did notice the ldmd.log file from the Z400 is 22 MB in size. It has a
> lot of errors that the other machine does not have. I am not sure if this
> is from the slightly different versions of software, but I also wonder if
> since it is getting errors on those grib files does it spend less time on
> those (even though I do not save them anyway) so it has more processing
> time for the files I do save???  Just throwing out random ideas..
> 
> 
> --- ldmd.log from Z400 example---
> 
> 20201204T173026.868109Z     noaaportIngester[24566]          
> grib2name.c:grib2name() ERROR Couldn't decode GRIB2 message. WMO 
> header="YAUL02 KWNR 041726"
> 20201204T173212.366265Z     noaaportIngester[24566]           
> gb2param.c:gb2_param() WARN  Couldn't get parameter info: iver=255, disc=209, 
> cat=2, id=5, pdtn=0, center=nssl, lclver=1, file=g2varsnssl1.tbl
> 20201204T173212.366319Z     noaaportIngester[24566]             
> gb22gem.c:gb2_2gem() ERROR [GB 1] Couldn't get parameter values
> 20201204T173212.366368Z     noaaportIngester[24566]          
> grib2name.c:grib2name() ERROR Couldn't decode GRIB2 message. WMO 
> header="YAUL02 KWNR 041728"
> 20201204T173413.283900Z     noaaportIngester[24566]           
> gb2param.c:gb2_param() WARN  Couldn't get parameter info: iver=255, disc=209, 
> cat=2, id=5, pdtn=0, center=nssl, lclver=1, file=g2varsnssl1.tbl
> 20201204T173413.283951Z     noaaportIngester[24566]             
> gb22gem.c:gb2_2gem() ERROR [GB 1] Couldn't get parameter values
> 20201204T173413.283982Z     noaaportIngester[24566]          
> grib2name.c:grib2name() ERROR Couldn't decode GRIB2 message. WMO 
> header="YAUL02 KWNR 041730"
> 20201204T173613.145506Z     noaaportIngester[24566]           
> gb2param.c:gb2_param() WARN  Couldn't get parameter info: iver=255, disc=209, 
> cat=2, id=5, pdtn=0, center=nssl, lclver=1, file=g2varsnssl1.tbl
> 20201204T173613.145558Z     noaaportIngester[24566]             
> gb22gem.c:gb2_2gem() ERROR [GB 1] Couldn't get parameter values
> 20201204T173613.145589Z     noaaportIngester[24566]          
> grib2name.c:grib2name() ERROR Couldn't decode GRIB2 message. WMO 
> header="YAUL02 KWNR 041732"
> 20201204T173759.517353Z     noaaportIngester[24566]           
> gb2param.c:gb2_param() WARN  Couldn't get parameter info: iver=255, disc=209, 
> cat=2, id=5, pdtn=0, center=nssl, lclver=1, file=g2varsnssl1.tbl
> 20201204T173759.517406Z     noaaportIngester[24566]             
> gb22gem.c:gb2_2gem() ERROR [GB 1] Couldn't get parameter values
> 20201204T173759.517446Z     noaaportIngester[24566]          
> grib2name.c:grib2name() ERROR Couldn't decode GRIB2 message. WMO 
> header="YAUL02 KWNR 041734"
> 20201204T174000.876995Z     noaaportIngester[24566]           
> gb2param.c:gb2_param() WARN  Couldn't get parameter info: iver=255, disc=209, 
> cat=2, id=5, pdtn=0, center=nssl, lclver=1, file=g2varsnssl1.tbl
 ...

The 'ERROR Couldn't decode GRIB2 message' messages are caused from GRIB2 
definitions
not being in the various GRIB2 ('*.tbl') files that get installed in the 
~ldm/etc
directory. When the information for the GRIB2 field are not found, a full 
Product
ID can not be created.  This does not, however, affect the product being 
processed
into the LDM queue, all products are inserted into the queue unless they are 
corrupt.
It will affect the processing of products IF the pattern-action file action(s)
are written to match Product IDs.

FYI: the GRIB2 table files that I am referring to in the previous paragraph are
updated with each new LDM release, and up to date versions can be download
from Github.

After seeing these messages, it is my belief that the cause of the difference
on your machines is different LDM versions being run, and the reason that
the processing is different is the machines are using different (and out of
date) sets of GRIB2 tables.

I think that the simplest thing for you to do at this point is upgrade
to the latest version of the LDM, v6.13.13 on both of your machines,
as that will have GRIB2 tables that are quite a bit newer and more
complete than the ones you are currently using.  I also suggest keeping
up to date with the LDM release on both machines in lockstep.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: ZLV-851048
Department: Support NOAAPORT
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.