[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IDD #LEH-889399]: 20201202: LDM log warnings



Hi,

re:
> Thank you

No worries.

re:
> Here is the regutil output.
> 
> /delete-info-files : 0
> /fmtp-retx-timeout : 300
> /hostname : ldm.met.tamu.edu
> /insertion-check-interval : 300
> /oess-pathname : /home/ldm/etc/OESS-account.json
> /reconciliation-mode : do nothing
> /check-time/enabled : 1
> /check-time/limit : 10
> /check-time/warn-if-disabled : 1
> /check-time/ntpdate/command : /usr/sbin/ntpdate
> /check-time/ntpdate/servers : time.tamu.edu ntp.ucsd.edu ntp1.cs.wisc.edu
> ntppub.tamu.edu otc1.psu.edu timeserver.unidata.ucar.edu
> /check-time/ntpdate/timeout : 5
> /metrics/count : 4
> /metrics/file : /home/ldm/var/logs/metrics.txt
> /metrics/files : /home/ldm/var/logs/metrics.txt*
> /metrics/netstat-command : /usr/bin/netstat -A inet -t -n
> /metrics/top-command : /usr/bin/top -b -n 1
> /log/count : 7
> /log/file : /home/ldm/var/logs/ldmd.log
> /log/rotate : 1
> /pqsurf/config-path : /home/ldm/etc/pqsurf.conf
> /pqsurf/datadir-path : /home/ldm/var/data
> /scour/config-path : /home/ldm/etc/scour.conf
> /surf-queue/path : /home/ldm/var/queues/pqsurf.pq
> /surf-queue/size : 2M
> /server/config-path : /home/ldm/etc/ldmd.conf
> /server/enable-anti-DOS : TRUE
> /server/ip-addr : 0.0.0.0
> /server/max-clients : 256
> /server/max-latency : 3600
> /server/port : 388
> /server/time-offset : 3600
> /queue/path : /home/ldm/var/queues/ldm.pq
> /queue/size : 16000M
> /queue/slots : default
> /pqact/config-path : /home/ldm/etc/pqact.conf
> /pqact/datadir-path : /

I must say that I am a bit surprised that the LDM registry on
your machine is setup the way I recommend.  My "musing" that
the SATELLITE ingest problem you had experienced possibly having
been caused by, for instance, the /server/max-latency setting
was totally off base, so you should ignore it!

My comment about the LDM log messages being more chatty in newer
releases stands, however.

A review of the latency plots for various feeds that you are
receiving revealed that the latencies for products coming from
the Penn State relay, ldm.meteo.psu.edu, can be quite large.
This is, quite frankly, a big surprise since the networking
at Penn State is very good.

So, what to make of the log messages you asked about in your
previous email:

20201202T232701.661461Z idd.meteo.psu.edu [23582]    down6.c:vetProduct:226     
         WARN  Ignoring too-old product:       9518 20201202222310.266706 HDS 
105266874  JUSA41 KWBC 022200 /pBUFR

A quick look at the latencies for producs in the HDS feed:

https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?HDS+ldm.met.tamu.edu

show exactly why these log messages were generated - your machine is receiving
products from Penn State that are much older than the ones received
from our primary IDD relay, idd.unidata.ucar.edu.  My reading of the
latency plot above is that you are receiving everything you are REQUESTing
from us in a timely manner (meaning very low latencies), and then some time
"much" later, sometimes as long or longer than the 3600 seconds that is
set in your /server/max-latency entry in the LDM registry.  The LDM is
doing what it is intended to do - discard very old products.

The next question is what the residency time is for products in your
machine's LDM queue.  Assuming that you turned on metrics gathering
(via the crontabl action that runs 'ldmadmin addmetrics', your 
machine should be updating the file 'metrics.txt' in the
/home/ldm/var/logs directory.  If this file is being updated (as it
will be if the crontab entry is active), and if your machine has
'gnuplot' installed, you can produce plots of the various metrics
that we consider important in monitoring an LDM installation.  This
is done as follows:

<as 'ldm' logged in with X-window display capability>
ldmadmin plotmetrics

Again, 'gnuplot' will have needed to be installed for this
to work, and you will have to be logged into the machine
through an interface that supports graphical displays back
to the machine that you are using to access your LDM machine
(for instance, an SSH session where you have tunneled X, and
your LDM machine having been configured to support X tunneling).

If 'gunplot' is not installed, and/or if your setup doesn't
support display of XWindow graphics, you can still ascertain
the product residency time as measured by age of the oldest
product in the LDM queue by eviewing your metrics.txt file.

Also, to get a single snapshot of the age of the oldest product
in your queue at the current time, you can run:

<as 'ldm'>
pqmon

The last value in the line that contains numerical values is the
age of the oldest product in the LDM queue in seconds at the time
that 'pqmon' was run.

Why is the age of the oldest product in the queue important?

The LDM will eliminate received products that have an MD5 signature
that matches a MD5 signature of a product already in the queue.
This elimination of duplicate products is done silently (unless
you put logging into debug mode which is _not_ recommended or
any length of time as the output is quite voluminous!).

If the MD5 signature for a received product does not match an
MD5 signature for a product already in the queue, the product
will be inserted into the queue IF the latency for the product
is less than the /server/max-latency value set in the LDM
registry.  It is apparent that the log messages you sent
previously indicated that a) there was not a product in the
queue that had the same MD5 signature and b) the product's
latency exceeded the /server/max-latency value in your
LDM's registry.

The question I can not answer is why the latencies in feeds from
Penn State are so large.  Figuring this out would require either
help from the TAMU networking folks, or a login with 'root' privilege
on your machine so that a bandwidth tester like 'perfsonar' could
be run to see where the bottleneck is.

In the interim, you may want to change your REQUEST(s) to Penn
state to either our backup IDD relay cluster, iddb.unidata.ucar.edu,
or, better yet, to the IDD relay at the University of Wisconsin/AOS,
idd.aos.wisc.edu.  Before you decide to switch your REQUEST(s), you
should verify that you are ALLOWed to REQUEST from the upstream feed
site that you want to switch to.  For instance:

<as 'ldm'>
notifyme -vl- -f ANY -h idd.aos.wisc.edu

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: LEH-889399
Department: Support IDD
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.