[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #AMR-414518]: LDM 6.13.12 killed off its pqact with signal 14, but why! :)



Hi Daryl,

> I'm rolling with LDM 6.13.12 on RHEL 8.2 64bit and just discovered to my 
> horror that LDM killed off its pqact child a few weeks back, so I missed all 
> kinds of processing :(
> 
> The ldmd.log simply has:
> 
> 20201019T090611.164977Z ldmd[11302]                 ldmd.c:reap:131           
>           NOTE  child 11305 terminated by signal 14: pqact etc/pqact_cod.conf
> 
> 
> $ grep 11305 var/logs/ldmd.log.1
> 20201005T151354.111797Z pqact[11305]                pqact.c:main:416          
>           NOTE  Starting Up {cmd: "pqact etc/pqact_cod.conf"}
> 20201005T151354.112168Z pqact[11305]                pqact.c:main:586          
>           NOTE  Starting from insertion-time 2020-10-05 15:13:44.656157 UTC
> 20201019T090611.164977Z ldmd[11302]                 ldmd.c:reap:131           
>           NOTE  child 11305 terminated by signal 14: pqact etc/pqact_cod.conf
> 
> The only other logging around this time is:
> 
> 20201012T131216.455128Z rtstats[11304]              error.c:err_log:236       
>           WARN  Couldn't connect to LDM on rtstats.unidata.ucar.edu using 
> either port 388 or po
> rtmapper; : RPC: Remote system error - Connection refused
> 20201012T131217.492879Z rtstats[11304]              error.c:err_log:236       
>           WARN  Couldn't connect to LDM on rtstats.unidata.ucar.edu using 
> either port 388 or po
> rtmapper; : RPC: Remote system error - Connection refused
> 20201019T090611.164977Z ldmd[11302]                 ldmd.c:reap:131           
>           NOTE  child 11305 terminated by signal 14: pqact etc/pqact_cod.conf
> 20201021T090514.845984Z idd.cod.edu[11307]          error.c:err_log:236       
>           NOTE  No heartbeat from upstream LDM for 300 seconds. Disconnecting.
> 20201021T090514.851470Z idd.cod.edu[11307]          requester6.c:req6_new:496 
>           NOTE  LDM-6 desired product-class: 20201021080514.848966 TS_ENDT 
> {{EXP, "^cod "},{NONE, "SIG=153042095c0c0bd7d673c66ee1b63b87"}}
> 20201021T090514.917249Z idd.cod.edu[11307]          
> requester6.c:make_request:222       NOTE  Upstream LDM-6 on idd.cod.edu is 
> willing to be a primary feeder
> 20201021T091015.021508Z idd.cod.edu[11307]          error.c:err_log:236       
>           NOTE  No heartbeat from upstream LDM for 300 seconds. Disconnecting.
> 20201021T091015.021740Z idd.cod.edu[11307]          requester6.c:req6_new:496 
>           NOTE  LDM-6 desired product-class: 20201021081015.021609 TS_ENDT 
> {{EXP, "^cod "},{NONE, "SIG=153042095c0c0bd7d673c66ee1b63b87"}}
> 20201021T091015.063496Z idd.cod.edu[11307]          
> requester6.c:make_request:222       NOTE  Upstream LDM-6 on idd.cod.edu is 
> willing to be a primary feeder
> 20201021T091925.810605Z idd.cod.edu[11307]          error.c:err_log:236       
>           NOTE  No heartbeat from upstream LDM for 300 seconds. Disconnecting.
> 20201021T091925.810817Z idd.cod.edu[11307]          requester6.c:req6_new:496 
>           NOTE  LDM-6 desired product-class: 20201021081925.810754 TS_ENDT 
> {{EXP, "^cod "},{NONE, "SIG=ac0db31f508d41ba71763c661b535ef7"}}
> 20201021T091925.860399Z idd.cod.edu[11307]          
> requester6.c:make_request:222       NOTE  Upstream LDM-6 on idd.cod.edu is 
> willing to be a primary feeder
> 
> Why would LDM send a signal 14 to its pqact!?!?

The LDM didn't. The pqact(1) process sent the SIGALRM (signal 14) to itself due 
to a latent bug that's been in pqact(1) since its creation. The bug is very 
hard to trigger, which is why no one has seen it until recently.

The latest version of the LDM shouldn't have this problem (I say "shouldn't" 
because the bug is *very* hard to trigger).

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: AMR-414518
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.