Gabe, >Date: Mon, 14 Feb 2005 15:21:52 -0500 (EST) >From: Gabe Langbauer <address@hidden> >Organization: Ohio State University >To: Steve Emmerson <address@hidden> >Subject: Re: 20050214: LDM product queue corruption The above message contained the following: > The original log is attached, note there is no ldmping issue on this log, > it seems to die with a rpc.ldmd error...and there is a mention of rtstats. > I don't know if those are the stats from "do stats" Everytime subsequent > time I issued the start command I got this log (although times were > different): > > Feb 12 23:24:21 twister ldmping: SVC_UNAVAIL 0.000601 0 > localhost RPC: Program not registered > Feb 12 23:24:21 twister pqcheck: Starting Up (10472) > Feb 12 23:24:21 twister pqcheck: The writer-counter of the > product-queue is 0 > Feb 12 23:24:21 twister pqcheck: Exiting The above are OK. The "ldmping" entry is from the ldmadmin(1) script testing to see if an LDM is already running. The pqcheck(1) entries are from the same script checking to see that the product-queue is OK. > I agree, mighty suspicious indeed. Logs above The end of the logfile contained this Feb 12 22:58:54 twister rpc.ldmd: child 793 terminated by signal 25 Process 793 was a pqact(1) process: $ fgrep '' ldmd.log.4 Feb 12 07:02:16 twister pqact: child 569 exited with status 1 Feb 12 07:58:21 twister pqact: child 16497 exited with status 1 Feb 12 21:12:23 twister pqact: child 11341 exited with status 1 Feb 12 22:30:00 twister pqact: pbuf_flush (3) write: Broken pipe and was, undoubtably, started via an EXEC entry in the LDM configuration-file, etc/ldmd.conf. The LDM server exits when an EXEC-ed child process terminates abnormally due to a seriously bad signal (e.g., SIGSEGV). Oddly, on my system, signal 25 is SIGCONT and should not cause the pqact(1) process to terminate. What is it on your system? One can work-around this behavior by wrapping EXEC-ed programs in a shell-script that ensures that their abnormal termination is never seen by the LDM, e.g., $ cat util/execWrapper while true do "$@" logger -p local0.notice "Restarting: $@" done (The above is off-the-top-of-my-head and might need modification.) The relevant EXEC entry is then replaced with EXEC "execWrapper prog a1 a2" (assuming the script is in the "util/" subdirectory and is executable). Regards, Steve Emmerson
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.