[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20050107: Query about ldm process



James,

The error-message you encountered indicated that the product-queue was
corrupted.  This could happen if, for example, a process that was writing
to the product-queue was terminated by a SIGKILL.

The fact that your were unable to restart the LDM, even after recreating
the product-queue, indicates a severe problem.  We have never seen the
LMD system fail when starting with a new product-queue.

The fact that you needed to reboot the computer in order to sucessfully
restart the LDM indicates that either the operating-system was at fault
(not unknown in the Linux universe) or that something akin to a rogue
process was messing with the product-queue.

What version of Linux are you running and is it up-to-date?

You should consider upgrading to version 6.1.0 of the LDM.  It might
solve your problem.

I recommend not building the LDM package with assertions enabled.  I
believe this is the default with version 6.1.0.

Regards,
Steve Emmerson

NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publicly available
through the web.  If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.

------- Original Message

>To: address@hidden
>From: James Murakami <address@hidden>
>Subject: Query about ldm process
>Organization: UCAR/Unidata
>Keywords: 200501071817.j07IHuv2021340

--Bed_of_Clams_871_000
Content-Type: TEXT/plain; charset=us-ascii
Content-MD5: y1pbE+6i9eQZ+6uWXe8aMA==

Hi.

Overnight the ldm stopped processing data. When I tried re-starting
the ldm, I repeatedly saw the following:

sundog aeolus[29835]: assertion "rl->nelems + rl->nfree + rl->nempty == 
rl->nalloc" failed: file "pq.c", line 206

I attached a ldmd log file after one of the restarts.

I went as far as creating a new queue, but that didn't help. Finally,
rebooting the linux computer worked.

Is there anything I can do to prevent this from happening in the future?

James

----------------------------------------------
James Murakami
Staff Meteorologist/Student Affairs
Department of Atmospheric and Oceanic Sciences
University of California, Los Angeles
405 Hilgard Ave.
Los Angeles, CA  90095-1565


   e-mail:  address@hidden
telephone:  310-825-2418
      Fax:  310-206-5219
-----------------------------------------------

--Bed_of_Clams_871_000
Content-Type: TEXT/plain; name="ldmd.log.1"; charset=us-ascii; x-unix-mode=0664
Content-Description: ldmd.log.1
Content-MD5: DlTQXXHsKHBtrSy85LbX3Q==

Jan 07 15:29:02 sundog rpc.ldmd[29825]: Starting Up (version: 6.0.14; built: Ju
l 23 2003 14:28:16) 
Jan 07 15:29:02 sundog pqact[29828]: Starting Up 
Jan 07 15:29:02 sundog rtstats[29829]: Starting Up (29825) 
Jan 07 15:29:02 sundog thelma[29830]: Starting Up(6.0.14): thelma.ucar.edu: TS_
ZERO TS_ENDT {{CONDUIT,  "MT.(eta|nam)"}} 
Jan 07 15:29:02 sundog thelma[29830]: Desired product class: 20050107142902.861
 TS_ENDT {{CONDUIT,  "MT.(eta|nam)"}} 
Jan 07 15:29:02 sundog thelma[29831]: Starting Up(6.0.14): thelma.ucar.edu: TS_
ZERO TS_ENDT {{CONDUIT,  "MT.(avn|gfs).*DF.gr1"}} 
Jan 07 15:29:02 sundog thelma[29831]: Desired product class: 20050107142902.863
 TS_ENDT {{CONDUIT,  "MT.(avn|gfs).*DF.gr1"}} 
Jan 07 15:29:02 sundog peridot[29833]: Starting Up(6.0.14): peridot.atmos.ucla.
edu: TS_ZERO TS_ENDT {{WSI,  ".*"}} 
Jan 07 15:29:02 sundog peridot[29833]: Desired product class: 20050107142902.86
4 TS_ENDT {{WSI,  ".*"}} 
Jan 07 15:29:02 sundog thelma[29832]: Starting Up(6.0.14): thelma.ucar.edu: TS_
ZERO TS_ENDT {{NIMAGE,  ".*"}} 
Jan 07 15:29:02 sundog thelma[29832]: Desired product class: 20050107142902.874
 TS_ENDT {{NIMAGE,  ".*"}} 
Jan 07 15:29:02 sundog aeolus[29835]: Starting Up(6.0.14): aeolus.ucsd.edu: TS_
ZERO TS_ENDT {{DIFAX|FSL|UNIDATA,  ".*"}} 
Jan 07 15:29:02 sundog aeolus[29835]: Desired product class: 20050107142902.882
 TS_ENDT {{DIFAX|FSL|UNIDATA,  ".*"}} 
Jan 07 15:29:02 sundog striker2[29836]: Starting Up(6.0.14): striker2.atmos.alb
any.edu: TS_ZERO TS_ENDT {{NLDN,  ".*"}} 
Jan 07 15:29:02 sundog aeolus[29837]: Starting Up(6.0.14): aeolus.ucsd.edu: TS_
ZERO TS_ENDT {{NNEXRAD,  "/p(N0R|N1R|N0S|N0V|N1V|N0Z|NET|NTP|NVW)"}} 
Jan 07 15:29:02 sundog aeolus[29838]: Starting Up(6.0.14): aeolus.ucsd.edu: TS_
ZERO TS_ENDT {{FNEXRAD,  ".*"}} 
Jan 07 15:29:02 sundog freshair[29839]: Starting Up(6.0.14): freshair.atmos.was
hington.edu: TS_ZERO TS_ENDT {{CRAFT,  ".*"}} 
Jan 07 15:29:02 sundog striker2[29836]: Desired product class: 20050107142902.8
84 TS_ENDT {{NLDN,  ".*"}} 
Jan 07 15:29:02 sundog pqbinstats[29827]: Starting Up (29825) 
Jan 07 15:29:02 sundog aeolus[29837]: Desired product class: 20050107142902.890
 TS_ENDT {{NNEXRAD,  "/p(N0R|N1R|N0S|N0V|N1V|N0Z|NET|NTP|NVW)"}} 
Jan 07 15:29:02 sundog aeolus[29838]: Desired product class: 20050107142902.892
 TS_ENDT {{FNEXRAD,  ".*"}} 
Jan 07 15:29:02 sundog freshair[29839]: Desired product class: 20050107142902.8
94 TS_ENDT {{CRAFT,  ".*"}} 
Jan 07 15:29:02 sundog peridot[29833]: Connected to upstream LDM-6 
Jan 07 15:29:02 sundog thelma[29830]: Connected to upstream LDM-6 
Jan 07 15:29:02 sundog thelma[29831]: Connected to upstream LDM-6 
Jan 07 15:29:02 sundog thelma[29832]: Connected to upstream LDM-6 
Jan 07 15:29:02 sundog aeolus[29835]: Connected to upstream LDM-6 
Jan 07 15:29:03 sundog aeolus[29837]: Connected to upstream LDM-6 
Jan 07 15:29:03 sundog aeolus[29838]: Connected to upstream LDM-6 
Jan 07 15:29:03 sundog peridot[29833]: Upstream LDM is willing to feed 
Jan 07 15:29:03 sundog thelma[29830]: Upstream LDM is willing to feed 
Jan 07 15:29:03 sundog thelma[29831]: Upstream LDM is willing to feed 
Jan 07 15:29:03 sundog aeolus[29838]: Upstream LDM is willing to feed 
Jan 07 15:29:03 sundog aeolus[29835]: Upstream LDM is willing to feed 
Jan 07 15:29:03 sundog aeolus[29837]: Upstream LDM is willing to feed 
Jan 07 15:29:03 sundog striker2[29836]: Connected to upstream LDM-6 
Jan 07 15:29:03 sundog thelma[29832]: Upstream LDM is willing to feed 
Jan 07 15:29:03 sundog striker2[29836]: Upstream LDM is willing to feed 
Jan 07 15:29:03 sundog striker2[29836]: assertion "rl->nelems + rl->nfree + rl-
>nempty == rl->nalloc" failed: file "pq.c", line 2008 
Jan 07 15:29:03 sundog thelma[29830]: assertion "rl->nelems + rl->nfree + rl->n
empty == rl->nalloc" failed: file "pq.c", line 2067 
Jan 07 15:29:03 sundog freshair[29839]: Connected to upstream LDM-6 
Jan 07 15:29:03 sundog freshair[29839]: Upstream LDM is willing to feed 
Jan 07 15:29:03 sundog peridot[29833]: assertion "rl->nelems + rl->nfree + rl->
nempty == rl->nalloc" failed: file "pq.c", line 2008 
Jan 07 15:29:04 sundog freshair[29839]: assertion "rl->nelems + rl->nfree + rl-
>nempty == rl->nalloc" failed: file "pq.c", line 2067 
Jan 07 15:29:09 sundog aeolus[29835]: assertion "rl->nelems + rl->nfree + rl->n
empty == rl->nalloc" failed: file "pq.c", line 2008 
Jan 07 15:29:10 sundog aeolus[29837]: assertion "rl->nelems + rl->nfree + rl->n
empty == rl->nalloc" failed: file "pq.c", line 2067 
Jan 07 15:29:10 sundog aeolus[29838]: assertion "rl->nelems + rl->nfree + rl->n
empty == rl->nalloc" failed: file "pq.c", line 2008 
Jan 07 15:29:11 sundog thelma[29832]: assertion "rl->nelems + rl->nfree + rl->n
empty == rl->nalloc" failed: file "pq.c", line 2067 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: child 29839 terminated by signal 6 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: Killing (SIGINT) process group 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: SIGINT 
Jan 07 15:29:17 sundog pqbinstats[29827]: Interrupt 
Jan 07 15:29:17 sundog pqbinstats[29827]: Exiting 
Jan 07 15:29:17 sundog pqact[29828]: Interrupt 
Jan 07 15:29:17 sundog pqact[29828]: Exiting 
Jan 07 15:29:17 sundog rtstats[29829]: Interrupt 
Jan 07 15:29:17 sundog rtstats[29829]: Exiting 
Jan 07 15:29:17 sundog thelma[29831]: SIGINT 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: Terminating process group 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: child 29838 terminated by signal 6 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: Killing (SIGINT) process group 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: child 29837 terminated by signal 6 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: Killing (SIGINT) process group 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: child 29836 terminated by signal 6 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: Killing (SIGINT) process group 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: child 29835 terminated by signal 6 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: Killing (SIGINT) process group 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: child 29833 terminated by signal 6 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: Killing (SIGINT) process group 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: child 29832 terminated by signal 6 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: Killing (SIGINT) process group 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: child 29830 terminated by signal 6 
Jan 07 15:29:17 sundog rpc.ldmd[29825]: Killing (SIGINT) process group 

--Bed_of_Clams_871_000--

------- End of Original Message