[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #TIB-337303]: New IDD relay - ldm is hanging after some time



Pete,

> Finally got a backtrace from it. Process 8472 was stuck at 100% CPU when
> I started gdb.
> 
> address@hidden ldm-6.9.7]# gdb ldmd 8472
> GNU gdb (GDB) Red Hat Enterprise Linux (7.1-29.el6)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/local/ldm/bin/ldmd...done.
> Attaching to program: /usr/local/ldm/bin/ldmd, process 8472
> Reading symbols from /usr/local/ldm/lib/libldm.so.0...done.
> Loaded symbols for /usr/local/ldm/lib/libldm.so.0
> Reading symbols from /usr/local/ldm/lib/libxml2.so.2...done.
> Loaded symbols for /usr/local/ldm/lib/libxml2.so.2
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libnss_files.so.2
> Reading symbols from /lib64/libnss_dns.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libnss_dns.so.2
> Reading symbols from /lib64/libresolv.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libresolv.so.2
> 0x000000337088431b in _wordcopy_fwd_dest_aligned () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.12-1.7.el6_0.5.x86_64
> (gdb) bt
> #0  0x000000337088431b in _wordcopy_fwd_dest_aligned () from
> /lib64/libc.so.6
> #1  0x0000003370883856 in bcopy () from /lib64/libc.so.6

How could a process be stuck using 100% of the CPU in bcopy(3)? That's a 
rhetorical question. The answer is "It can't." I suspect, therefore, that 
something's wrong with either your runtime library or with the operating-system.

> #2  0x00007f3aa8030f20 in xdrmem_putbytes (xdrs=0x7fff1bd97df0,
> addr=<value optimized out>, len=404167) at xdr_mem.c:150
> #3  0x00007f3aa803078a in my_xdr_opaque (xdrs=0x7fff1bd97df0, cp=<value
> optimized out>, cnt=<value optimized out>) at xdr.c:463
> #4  0x00007f3aa800f73c in xproduct (pq=0x1be6160, prod=0x7fff1bd97e90)
> at pq.c:4541
> #5  pq_insertNoSig (pq=0x1be6160, prod=0x7fff1bd97e90) at pq.c:5538
> #6  0x00007f3aa800fb26 in pq_insert (pq=<value optimized out>,
> prod=<value optimized out>) at pq.c:5578

The process is trying to insert something into the product-queue.

> #7  0x00007f3aa8017543 in dh_saveDataProduct (pq=<value optimized out>,
> info=0x1be8910, data=<value optimized out>, wasHereis=1, notifyAutoShift=1)
> at DownHelp.c:159
> #8  0x00007f3aa801c65d in hereis_6_svc (prod=<value optimized out>,
> rqstp=0x7fff1bd984d0) at ldm6_server.c:661
> #9  0x00007f3aa801acf1 in ldmprog_6 (rqstp=0x7fff1bd984d0,
> transp=0x1cb12d0) at ldm6_svc.c:100
> #10 0x00007f3aa802e629 in svc_getreqsock (sock=<value optimized out>) at
> svc.c:543
> #11 0x00007f3aa801ffa2 in one_svc_run (xp_sock=4, inactive_timeo=<value
> optimized out>) at one_svc_run.c:91
> #12 0x00007f3aa8022aca in run_service (upName=0x1be6140
> "idd.unidata.ucar.edu", port=30085840, request=<value optimized out>,
> inactiveTimeout=60,
> pqPathname=0x7fff1bd987a4 "\377\177", pq=<value optimized out>,
> isPrimary=1) at requester6.c:229
> #13 req6_new (upName=0x1be6140 "idd.unidata.ucar.edu", port=30085840,
> request=<value optimized out>, inactiveTimeout=60,
> pqPathname=0x7fff1bd987a4 "\377\177", pq=<value optimized out>,
> isPrimary=1) at requester6.c:672
> #14 0x00007f3aa8014a62 in prog_requester (ldmPort=<value optimized out>)
> at acl.c:1631
> #15 run_requester (ldmPort=<value optimized out>) at acl.c:1809
> #16 new_requester (ldmPort=<value optimized out>) at acl.c:1865
> #17 requester_add (ldmPort=<value optimized out>) at acl.c:1908
> #18 invert_request_acl (ldmPort=<value optimized out>) at acl.c:1974
> #19 0x0000000000405980 in read_conf (conf_path=0x7f3aa82517a0
> "/usr/local/ldm/etc/ldmd.conf", doSomething=1, defaultPort=388) at
> parser.y:593
> #20 0x00000000004065b7 in main (ac=<value optimized out>, av=<value
> optimized out>) at ldmd.c:1049
> (gdb)
> 
> 
> ps -eflu ldm shows this while connected to gdb:
> 
> address@hidden ~]$ ps -flu ldm
> F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY
> TIME CMD
> 4 S ldm       2697  2696  0  80   0 - 28344 n_tty_ May24 pts/2
> 00:00:03 -tcsh
> 5 S ldm       3098  3093  0  80   0 - 24362 poll_s May24 ?
> 00:02:07 sshd: address@hidden/3
> 0 S ldm       3099  3098  0  80   0 - 28270 rt_sig May24 pts/3
> 00:00:00 -tcsh
> 5 S ldm       8462     1  0  80   0 -  5277 poll_s 16:14 ?
> 00:00:00 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 0 S ldm       8464  8462  0  80   0 - 6024619 fcntl_ 16:14 ?
> 00:00:26 rtstats -h rtstats.unidata.ucar.edu
> 1 S ldm       8465  8462  0  80   0 - 6025590 fcntl_ 16:14 ?
> 00:00:19 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm       8466  8462  0  80   0 - 6025585 fcntl_ 16:14 ?
> 00:00:17 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm       8467  8462  0  80   0 - 6025672 fcntl_ 16:14 ?
> 00:00:20 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm       8468  8462  0  80   0 - 6025653 fcntl_ 16:14 ?
> 00:00:16 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm       8469  8462  0  80   0 - 6029770 fcntl_ 16:14 ?
> 00:00:30 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm       8470  8462  0  80   0 - 6029677 fcntl_ 16:14 ?
> 00:00:01 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm       8471  8462  0  80   0 - 6026399 fcntl_ 16:14 ?
> 00:00:26 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 T ldm       8472  8462 10  80   0 - 6025021 utrace 16:14 ?
> 00:12:43 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm       8473  8462  3  80   0 - 6025008 fcntl_ 16:14 ?
> 00:04:12 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm       8474  8462  8  80   0 - 6025041 fcntl_ 16:14 ?
> 00:09:54 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm       8475  8462  2  80   0 - 6025007 fcntl_ 16:14 ?
> 00:03:20 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm       8476  8462  0  80   0 - 6025039 fcntl_ 16:14 ?
> 00:00:09 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm       8477  8462  8  80   0 - 6025050 fcntl_ 16:14 ?
> 00:10:01 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm       8478  8462  0  80   0 - 6025061 poll_s 16:14 ?
> 00:00:02 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 5 S ldm       8524  8520  0  80   0 - 24362 poll_s 16:15 ?
> 00:00:01 sshd: address@hidden/5
> 0 S ldm       8525  8524  0  80   0 - 28354 rt_sig 16:15 pts/5
> 00:00:00 -tcsh
> 0 R ldm       8880  8525 95  80   0 - 26993 -      18:16 pts/5
> 00:00:13 ps -flu ldm
> 5 S ldm      12678 12673  0  80   0 - 24362 poll_s May26 ?
> 00:00:00 sshd: address@hidden/4
> 0 S ldm      12679 12678  0  80   0 - 28251 n_tty_ May26 pts/4
> 00:00:00 -tcsh
> 4 S ldm      38352 38350  0  80   0 - 28261 rt_sig 16:04 pts/0
> 00:00:00 -tcsh
> 
> As soon as I detached gdb from the process it went back to 100% CPU.

I don't know what to tell you, Pete. I've never seen this happen and, as far as 
I can tell, it shouldn't happen. 100% of CPU in bcopy(3)!?

> --
> Pete Pokrandt - Systems Programmer
> UW-Madison Dept of Atmospheric and Oceanic Sciences
> 608-262-3086  - address@hidden
Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: TIB-337303
Department: Support LDM
Priority: Normal
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.