[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #WOM-274153]: Reproducible bug in LDM 6.13.10



Hi,

Sorry you're having problems.

> We had an issue at AllisonHouse early this Saturday morning. At 1:40 AM CT,
> our pqacts stopped writing to disk on NFS01. OK, I know what that is: But,
> with bad weather going on, I decide to run LDM 6.13.11.54
> on NFS01 to see if that would get me through the night. I start it up, and
> data starts flowing. But, even though I am not getting any error messages
> that the feeds are down, nothing
> is writing to disk, even though ldmadmin watch clearly shows all the feeds
> are streaming in. So, doing a bit of detective work while my phone is
> buzzing off the hook that our feeds are down,
> I go and run a gdb on both pqact processes to see if pqact is acting
> strangely. I'll let you figure out if it is.
> 
> [root@nfs-central1-b ~]# gdb -p 1419
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
> >
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Attaching to process 1419
> Reading symbols from /home/ldm/ldm-6.13.11.54/bin/pqact...done.
> Reading symbols from /home/ldm/ldm-6.13.11.54/lib/libldm.so.0...done.
> Loaded symbols for /home/ldm/ldm-6.13.11.54/lib/libldm.so.0
> Reading symbols from /lib64/libgdbm.so.4...Reading symbols from
> /lib64/libgdbm.so.4...(no debugging symbols found)...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib64/libgdbm.so.4
> Reading symbols from /lib64/libxml2.so.2...Reading symbols from
> /lib64/libxml2.so.2...(no debugging symbols found)...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib64/libxml2.so.2
> Reading symbols from /lib64/libz.so.1...Reading symbols from
> /lib64/libz.so.1...(no debugging symbols found)...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib64/libz.so.1
> Reading symbols from /lib64/librt.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> found)...done.
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/liblzma.so.5...Reading symbols from
> /lib64/liblzma.so.5...(no debugging symbols found)...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib64/liblzma.so.5
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> 0x00007f3879907e80 in __open_nocancel () from /lib64/libpthread.so.0
> Missing separate debuginfos, use: debuginfo-install gdbm-1.10-8.el7.x86_64
> glibc-2.17-260.el7_6.3.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64
> xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
> (gdb) where
> #0  0x00007f3879907e80 in __open_nocancel () from /lib64/libpthread.so.0
> #1  0x00007f387a7c2c0f in mkdirs_open (path=path@entry=0x614967
> <bufs.6109+7> "ah/gempak/nexrad/craft/KSRX/KSRX_20190427_0729",
> flags=flags@entry=65, mode=mode@entry=438) at mkdirs_open.c:99

This pqact(1) process was in the process of creating the directory 
"ah/gempak/nexrad/craft/KSRX" -- and any necessary parent directories -- in 
order to FILE the data-product to the file "KSRX_20190427_0729".

This is perfectly normal behavior.

Is the disk partition for `regutil /pqact/datadir-path`full?

> #2  0x0000000000406531 in unio_open (entry=0x20b6bc0, ac=1, av=<optimized
> out>) at filel.c:994
> #3  0x0000000000405d65 in entry_new (argv=0x610940 <argv.6110>, argc=2,
> type=UNIXIO) at filel.c:2855
> #4  fl_getEntry (type=UNIXIO, argc=2, argv=0x610940 <argv.6110>, isNew=0x0)
> at filel.c:530
> #5  0x0000000000406d6b in unio_prodput (prodp=0x7ffce0816740, argc=2,
> argv=0x610940 <argv.6110>, ignored=<optimized out>, also_ignored=<optimized
> out>) at filel.c:1282
> #6  0x000000000040a9d1 in prodAction (xlen=81336, xprod=0x7f32df6f82b0,
> pal=0x1ef4630, prod=0x7ffce0816740) at palt.c:1232
> #7  processProduct (prod_par=<optimized out>, queue_par=0x7ffce0816970,
> opt_arg=<optimized out>) at palt.c:1359
> #8  0x00007f387a7d3455 in pq_next (pq=0x1f797e0, reverse=reverse@entry=false,
> clss=clss@entry=0x7ffce0816c50, func=0x409dc0 <processProduct>,
> keep_locked=keep_locked@entry=false, app_par=app_par@entry=0x0) at pq.c:8197
> #9  0x00000000004038da in main (ac=<optimized out>, av=<optimized out>) at
> pqact.c:635
> (gdb) quit
> A debugging session is active.
> 
> Inferior 1 [process 1419] will be detached.
> 
> Quit anyway? (y or n) q
> Please answer y or n.
> A debugging session is active.
> 
> Inferior 1 [process 1419] will be detached.
> 
> Quit anyway? (y or n) y
> Detaching from program: /home/ldm/ldm-6.13.11.54/bin/pqact, process 1419
> [root@nfs-central1-b ~]# gdb -p 1420
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
> >
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Attaching to process 1420
> Reading symbols from /home/ldm/ldm-6.13.11.54/bin/pqact...done.
> Reading symbols from /home/ldm/ldm-6.13.11.54/lib/libldm.so.0...done.
> Loaded symbols for /home/ldm/ldm-6.13.11.54/lib/libldm.so.0
> Reading symbols from /lib64/libgdbm.so.4...Reading symbols from
> /lib64/libgdbm.so.4...(no debugging symbols found)...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib64/libgdbm.so.4
> Reading symbols from /lib64/libxml2.so.2...Reading symbols from
> /lib64/libxml2.so.2...(no debugging symbols found)...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib64/libxml2.so.2
> Reading symbols from /lib64/libz.so.1...Reading symbols from
> /lib64/libz.so.1...(no debugging symbols found)...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib64/libz.so.1
> Reading symbols from /lib64/librt.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> found)...done.
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/liblzma.so.5...Reading symbols from
> /lib64/liblzma.so.5...(no debugging symbols found)...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib64/liblzma.so.5
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> 0x00007f237af74546 in sigsuspend () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install gdbm-1.10-8.el7.x86_64
> glibc-2.17-260.el7_6.3.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64
> xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
> (gdb) where
> #0  0x00007f237af74546 in sigsuspend () from /lib64/libc.so.6

This pqact(1) process was waiting for a SIGCONT signal or 30 seconds (whichever 
would have come first) before checking the product-queue for a new product to 
process.

This is perfectly normal behavior.

> #1  0x00007f237c1e64f0 in pq_suspendAndUnblock (maxsleep=maxsleep@entry=15,
> unblockSigs=unblockSigs@entry=0x0, numSigs=numSigs@entry=0) at pq.c:8720
> #2  0x00007f237c1e66f9 in pq_suspend (maxsleep=maxsleep@entry=15) at
> pq.c:8751
> #3  0x0000000000403913 in main (ac=<optimized out>, av=<optimized out>) at
> pqact.c:679
> (gdb) quit
> A debugging session is active.
> 
> Inferior 1 [process 1420] will be detached.
> 
> Quit anyway? (y or n) q
> Please answer y or n.
> A debugging session is active.
> 
> Inferior 1 [process 1420] will be detached.
> 
> Quit anyway? (y or n) y
> Detaching from program: /home/ldm/ldm-6.13.11.54/bin/pqact, process 1420

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: WOM-274153
Department: Support LDM
Priority: High
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.