[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IDD #FCW-166050]: intermittent ldm feed



Carter,

> Sorry to take this long to get back to you.

No worries.

> meteor.mmm.ucar.edu (128.117.88.5) is not running any ldm software,
> however I do see it in the ldmadmin config output.  Is that a build issue
> and/or a run-time issue?

I think the host name used by the rtstats(1) utility on rain.mmm is 
mis-configured. See below.

Having the wrong host name will affect statistics reporting. It won't affect 
data distribution, however.

> rain.mmm.ucar.edu (mmm-rain, 128.117.88.124) is
> running the LDM software, and it already has the RT stats exec line in its
> ldmd.conf file.
> 
> Here are the outputs of the files you requested:
> 
> rain.mmm.ucar.edu:/users/ldm>ldmadmin config
> 
> hostname:              meteor.mmm.ucar.edu

The host name should be "rain.mmm.ucar.edu". This is due to the LDM registry 
having the wrong value for the "hostname" element. See below.

> os:                    Linux
> release:               3.10.0-862.2.3.el7.x86_64
> ldmhome:               /users/ldm
> LDM version:           6.13.6
> PATH:
> /users/ldm/ldm-6.13.6/bin:/users/ldm/decoders:/users/ldm/util:/users/ldm/bin:/usr/local/netcdf-3.6.3-gfortran/bin:/usr/local/cmpwrappers:/usr/local/complibs-gcc/bin:/usr/local/gcc-5.2.0/bin:.:/bin:/usr/bin:/sbin:/usr/sbin:/usr/bin/X11:/usr/kerberos/bin:/usr/local/bin:/usr/local/hpss/bin
> LDM conf file:         /users/ldm/etc/ldmd.conf
> pqact(1) conf file:    /users/ldm/etc/pqact.conf
> scour(1) conf file:    /users/ldm/etc/scour.conf
> product queue:         /users/ldm/var/queues/ldm.pq
> queue size:            1G bytes
> queue slots:           default
> reconciliation mode:   do nothing
> pqsurf(1) path:        /users/ldm/var/queues/pqsurf.pq
> pqsurf(1) size:        2M
> IP address:            0.0.0.0
> port:                  388
> PID file:              /users/ldm/ldmd.pid
> Lock file:             /users/ldm/.ldmadmin.lck
> maximum clients:       256
> maximum latency:       3600
> time offset:           3600
> log file:              /users/ldm/var/logs/ldmd.log
> numlogs:               7
> log_rotate:            1
> netstat:               /bin/netstat -A inet -t -n
> top:                   /usr/bin/top -b -n 1
> metrics file:          /users/ldm/var/logs/metrics.txt
> metrics files:         /users/ldm/var/logs/metrics.txt*
> num_metrics:           4
> check time:            1
> delete info files:     0
> ntpdate(1):            /usr/sbin/ntpdate
> ntpdate(1) timeout:    5
> time servers:          ntp.ucsd.edu ntp1.cs.wisc.edu ntppub.tamu.edu
> otc1.psu.edu timeserver.unidata.ucar.edu
> time-offset limit:     10

Other than the host name, I didn't notice anything in the above that was wrong.

> rain.mmm.ucar.edu:/users/ldm>cat ~ldm/etc/ldmd.conf
> #####
> # $Id: ldmd.conf,v 1.7 1995/11/08 15:39:54 mitch Exp $
> # Sample ldmd.conf for ldm
> ####
> 
> #
> # Programs that share a queue with rpc.ldmd
> # are started by it and are in the same process group.
> #
> 
> exec "pqact -v -l /users/ldm/logs/pqact.log -d /users/ldm 
> /users/ldm/etc/pqact.conf"
> exec "pqexpire -l /users/ldm/logs/pqexpire.log"

You should disable the "pqexpire" line above. That utility is obsolete and can 
interfere with the proper execution of the LDM system.

> exec "pqbinstats -l /users/ldm/logs/pqstat.log -d /users/ldm/logs"

Likewise for the "pqbinstats" entry.

Your LDM system must have been first installed quite a while ago.

> exec "rtstats -h rtstats.unidata.ucar.edu"
> 
> #exec "pqsurf"
> 
> #
> # LDM5 servers we ask for data
> #
> # request <feedset> <pattern> <hostname pattern>
> #    Per Unidata note: PRIMARY/ALTERNATE designations deprecated 12-08-2014
> #
> 
> request ANY ".*" idd.unidata.ucar.edu
> 
> # request EXP ".*" ice.ssec.wisc.edu
> 
> #request EXP
> # ".*"
> # aws.ssec.wisc.edu
> 
> request EXP ".*" amrc.ssec.wisc.edu
> 
> #request EXP
> # ".*"
> # polarmet12.mps.ohio-state.edu
> 
> 
> ###############################################################################
> # Begin Access control
> ###############################################################################
> 
> ###############################################################################
> # ALLOW: Who we are willing to feed
> #
> # allow <feedset> <hostname pattern>
> #allow ANY ^motherlode\.ucar\.edu$
> #allow ANY ^atm\.ucar\.edu$
> allow ANY ice.ssec.wisc.edu
> #allow ANY polarmet12.mps.ohio-state.edu
> allow ANY ^((localhost|loopback)|(127\.0\.0\.1\.?$))
> ###############################################################################
> 
> # send anything to your own machine
> #allow ANY
> #
> ^((localhost|loopback)|(127\.0\.0\.1\.?$)|([a-z].*\.unidata\.ucar\.edu\.?$))
> 
> ###############################################################################
> # ACCEPT: Who can feed us
> #
> # accept <feedset> <pattern> <hostname pattern>
> ###############################################################################
> 
> # accept anything from yourself
> #accept ANY
> #    .*
> #    ^((localhost|loopback)|(127\.0\.0\.1\.?$))
> 
> accept ANY .* ^((localhost|loopback)|(127\.117\.88\.145\.?$))
> 
> accept  ANY .* ^idd\.unidata\.ucar\.edu$
> 
> ###############################################################################
> # End Access control
> ###############################################################################
> rain.mmm.ucar.edu:/users/ldm>cat ~ldm/etc/registry.xml
> <?xml version="1.0"?>
> <registry>
> <delete-info-files>0</delete-info-files>
> <hostname>meteor.mmm.ucar.edu</hostname>

The value of the above element, "hostname", should be changed to 
"rain.mmm.ucar.edu".

> <insertion-check-interval>300</insertion-check-interval>
> <reconciliation-mode>do nothing</reconciliation-mode>
> <check-time>
> <enabled>1</enabled>
> <limit>10</limit>
> <warn-if-disabled>1</warn-if-disabled>
> <ntpdate>
> <command>/usr/sbin/ntpdate</command>
> <servers>ntp.ucsd.edu ntp1.cs.wisc.edu ntppub.tamu.edu otc1.psu.edu
> timeserver.unidata.ucar.edu</servers>
> <timeout>5</timeout>
> </ntpdate>
> </check-time>
> <log>
> <count>7</count>
> <file>/users/ldm/var/logs/ldmd.log</file>
> <rotate>1</rotate>
> </log>
> <metrics>
> <count>4</count>
> <file>/users/ldm/var/logs/metrics.txt</file>
> <files>/users/ldm/var/logs/metrics.txt*</files>
> <netstat-command>/bin/netstat -A inet -t -n</netstat-command>
> <top-command>/usr/bin/top -b -n 1</top-command>
> </metrics>
> <pqact>
> <config-path>/users/ldm/etc/pqact.conf</config-path>
> <datadir-path>/users/ldm/var/data</datadir-path>
> </pqact>
> <pqsurf>
> <config-path>/users/ldm/etc/pqsurf.conf</config-path>
> <datadir-path>/users/ldm/var/data</datadir-path>
> </pqsurf>
> <queue>
> <path>/users/ldm/var/queues/ldm.pq</path>
> <size>1G</size>
> <slots>default</slots>
> </queue>
> <scour>
> <config-path>/users/ldm/etc/scour.conf</config-path>
> </scour>
> <server>
> <config-path>/users/ldm/etc/ldmd.conf</config-path>
> <ip-addr>0.0.0.0</ip-addr>
> <max-clients>256</max-clients>
> <max-latency>3600</max-latency>
> <port>388</port>
> <time-offset>3600</time-offset>
> <enable-anti-DOS>TRUE</enable-anti-DOS>
> </server>
> <surf-queue>
> <path>/users/ldm/var/queues/pqsurf.pq</path>
> <size>2M</size>
> </surf-queue>
> </registry>

Disable the two LDM configuration-file entries, fix the host name, and restart 
the LDM.

Are you collecting metrics on the LDM system via a crontab(1) entry? If so, can 
you send us a plot of the age of the oldest product in the queue?

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: FCW-166050
Department: Support IDD
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.