[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[IDD #MIY-175774]: "WARN" Log Message in Concatenate NEXRAD2 Script

Subject: [IDD #MIY-175774]: "WARN" Log Message in Concatenate NEXRAD2 Script
Date: Fri, 15 Jan 2016 15:15:29 -0700
Hi Ziv,

Apologies for the very slow reply... I returned from the AMS Annual
Meeting in New Orleans this afternoon and am now working through the
support email that has accumulated while I was gone.

re:
> I am getting this message in my ldmd.log files:
> 
> "Jan 12 17:20:18 ldm-downstream pqact[21707] WARN: Processed oldest product
> in queue: 725.461 s"
> 
> I was instructed earlier to some some outputs to help diagnose the problem.
> In addition to these (below), I attached
> 
> - pqact.radars (a script I obtained from Unidata, modified slightly)
> - hhmmssRadarII.pl (another one of your scripts, modified slightly)
> - registry.xml
> - ldmd.conf

The Processed oldest product in queue messages indicate that the pattern-action
file action(s) are not keeping up with the new data that is flowing into the
LDM queue.  The remedies for this are one or more of the following:

- significantly increase the LDM queue size

  You would change the <size></size> value in the LDM registry file
  and increase it to measurably larger than what it is currently.
  How much you can increase the queue size is dependent on how much RAM
  your machine/VM has.  Currently, your queue size is 500 MB, and I would
  recommend increasing it to 2G ** IF ** you  have at least 3GB of RAM
  allocated for your VM.  If you do have enough RAM, try:

  in ~ldm/etc/registry.xml change:

  <queue>
    <path>/home/ldm/var/queues/ldm.pq</path>
    <size>500M</size>
    <slots>default</slots>
  </queue>

  to:

  <queue>
    <path>/home/ldm/var/queues/ldm.pq</path>
    <size>2G</size>
    <slots>default</slots>
  </queue>
   
  NB: after changing values in the LDM registry (~ldm/etc/registry.xml)
  or the LDM configuration (~ldm/etc/ldmd.conf) files, you need to stop
  and restart the LDM.  BUT, after changing the LDM queue size, you
  also need to delete the existing queue and create a new one:

<as 'ldm'>
ldmadmin stop
ldmadmin delqueue
ldmadmin mkqueue
ldmadmin start

- the other thing that is crucial is that your ability to write fast enough
  to the file system where your "decoding" is taking place (the combination of
  the FILE actions and the running of the hhmmssRadarII.pl script)

  I do not know if the mounted file system you are using (your mention of 
writing
  to /mnt) is fast or slow, but it seems like it might be slow even though it
  appears to be virtual and possibly memory based:

  /dev/vdb on /mnt type ext4 (rw)

  If it turns out that writes to /mnt are slow, you might be forced to do your
  "decoding" to a fast file system and then copy the resultant reconstituted
  volume scans to your desired location.

  Just so you know, we do not experience problems in the NEXRAD2 volume scan
  reconstitution procedure I sent you on any of our non-VM machines, nor do
  we encounter problems in doing the reconstitution in a AWS (Amazon Web 
Services)
  VM (where the eventual storage location for the reconstituted volume scans
  is an S3 bucket).

re:
> *output from 'mount':*
> 
> /dev/vda1 on / type ext4 (rw)
> proc on /proc type proc (rw,noexec,nosuid,nodev)
> sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
> none on /sys/fs/cgroup type tmpfs (rw)
> none on /sys/fs/fuse/connections type fusectl (rw)
> none on /sys/kernel/debug type debugfs (rw)
> none on /sys/kernel/security type securityfs (rw)
> udev on /dev type devtmpfs (rw,mode=0755)
> devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
> tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
> none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
> none on /run/shm type tmpfs (rw,nosuid,nodev)
> none on /run/user type tmpfs
> (rw,noexec,nosuid,nodev,size=104857600,mode=0755)
> none on /sys/fs/pstore type pstore (rw)
> /dev/vdb on /mnt type ext4 (rw)
> systemd on /sys/fs/cgroup/systemd type cgroup
> (rw,noexec,nosuid,nodev,none,name=systemd)

As I alluded to above, I would think that /dev/vdb would represent a fast
file system.  The fact that your processing is not keeping up suggests
otherwise, however.  We will know more when you increase your LDM queue
size as this will increase the queue residency time for received products
and consequently provide more time in which processing can be done before
"old" products in the queue are overwritten by newly received products.

re:
> *output from 'ldmadmin config':*
> 
> hostname:              nexradVM.chicago.edu <http://nexradvm.chicago.edu/>
> os:                    Linux
> release:               3.13.0-46-generic
> ldmhome:               /home/ldm
> LDM version:           6.12.14
> PATH:
> /home/ldm/ldm-6.12.14/bin:/home/ldm/decoders:/home/ldm/util:/home/ldm/bin:/home/ldm/ldm-6.12.14/bin:/home/ldm/decoders:/home/ldm/util:/home/ldm/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
> LDM conf file:         /home/ldm/etc/ldmd.conf
> pqact(1) conf file:    /home/ldm/etc/pqact.conf
> scour(1) conf file:    /home/ldm/etc/scour.conf
> product queue:         /home/ldm/var/queues/ldm.pq
> queue size:            500M bytes
> queue slots:           default
> reconciliation mode:   do nothing
> pqsurf(1) path:        /home/ldm/var/queues/pqsurf.pq
> pqsurf(1) size:        2M
> IP address:            0.0.0.0
> port:                  388
> PID file:              /home/ldm/ldmd.pid
> Lock file:             /home/ldm/.ldmadmin.lck
> maximum clients:       256
> maximum latency:       3600
> time offset:           3600
> log file:              /home/ldm/var/logs/ldmd.log
> numlogs:               7
> log_rotate:            0
> netstat:               /bin/netstat -A inet -t -n
> top:                   /usr/bin/top -b -n 1
> metrics file:          /home/ldm/var/logs/metrics.txt
> metrics files:         /home/ldm/var/logs/metrics.txt*
> num_metrics:           4
> check time:            1
> delete info files:     0
> ntpdate(1):            /usr/sbin/ntpdate
> ntpdate(1) timeout:    5
> time servers:          ntp.ucsd.edu ntp1.cs.wisc.edu ntppub.tamu.edu
> otc1.psu.edu timeserver.unidata.ucar.edu
> time-offset limit:     10
> 
> 
> *output from 'df -h':*
> 
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/vda1       9.9G  2.7G  6.8G  28% /
> none            4.0K     0  4.0K   0% /sys/fs/cgroup
> udev            5.9G   12K  5.9G   1% /dev
> tmpfs           1.2G  348K  1.2G   1% /run
> none            5.0M     0  5.0M   0% /run/lock
> none            5.9G     0  5.9G   0% /run/shm
> none            100M     0  100M   0% /run/user
> /dev/vdb        882G  6.8G  831G   1% /mnt

It would _really_ help the troubleshooting procedure if we could get a login
to your VM!

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: MIY-175774
Department: Support IDD
Priority: Normal
Status: Closed
Prev by Date: [IDD #LGF-656426]: Request for access to CONDUIT data
Next by Date: [IDD #ZVT-447547]: MRMS pqact entry
Previous by thread: [IDD #LGF-656426]: Request for access to CONDUIT data
Next by thread: [IDD #MIY-175774]: "WARN" Log Message in Concatenate NEXRAD2 Script
Index(es):
- Date
- Thread