[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20021120: LDM startup error message



>From: Ken Scheeringa <address@hidden>
>Organization: Purdue
>Keywords: 200211201420.gAKEKV425218 LDM ldmadmin ldmd.pq

Ken,

>Yesterday I rebooted my RS6000 box running AIX 4.3.2
>for the first time in a while, but LDM wouldn't restart.
>
>I haven't made any changes to the LDM configs in a very
>long time so I am wondering why this new error.

It could be a corrupt queue.

>I have attached a bit of ldmd.log that may help figure
>this out.  In the first part of the log, I am attempting
>to connect to my failover "sunset" at the U of Wisconsin,
>and in the second part I am attempting to connect to
>my usual upstream "anvil" here at Purdue.
>
>Both fail with errors in pq.c, but not exactly the same error.
>I don't know much about the internals of LDM, I just know
>that "it works" for my purposes of capturing a small portion
>of the DDPLUS feed.
>
>I appreciate any assistance you can provide.  Thanks!

-- ldmd.log --

Nov 20 14:00:55 shadow localhost[19500]: Connection from localhost
Nov 20 14:00:55 shadow localhost[19500]: Connection reset by peer
Nov 20 14:00:55 shadow localhost[19500]: Exiting
Nov 20 14:00:58 shadow pqexpire[23156]: assertion "found != 0" failed: file "pq.
c", line 627
Nov 20 14:01:04 shadow rpc.ldmd[21944]: child 23156 terminated by signal 6
Nov 20 14:01:04 shadow rpc.ldmd[21944]: Killing (SIGINT) process group
Nov 20 14:01:04 shadow rpc.ldmd[21944]: Interrupt
Nov 20 14:01:04 shadow rpc.ldmd[21944]: Exiting
Nov 20 14:01:04 shadow udp.ldmd[20012]: Interrupt
Nov 20 14:01:04 shadow udp.ldmd[20012]: Exiting
Nov 20 14:01:04 shadow pqact[20596]: Interrupt
Nov 20 14:01:04 shadow pqact[20596]: Exiting
Nov 20 14:01:04 shadow pqbinstats[20904]: Interrupt
Nov 20 14:01:04 shadow pqbinstats[20904]: Exiting
Nov 20 14:01:05 shadow rpc.ldmd[21944]: Terminating process group
Nov 20 14:01:05 shadow sunset[20342]: Interrupt
Nov 20 14:01:05 shadow sunset[20342]: Exiting
Nov 20 14:03:58 shadow rpc.ldmd[23502]: Starting Up (built: Jan 30 1996 10:54:55
)
Nov 20 14:03:58 shadow anvil[20598]: run_requester: Starting Up: anvil.eas.purdu
e.edu
Nov 20 14:03:58 shadow pqbinstats[20346]: Starting Up (23502)
Nov 20 14:03:58 shadow pqact[20906]: Starting Up
Nov 20 14:03:58 shadow pqexpire[21180]: Starting Up
Nov 20 14:03:59 shadow udp.ldmd[20014]: Starting Up
Nov 20 14:03:59 shadow localhost[19504]: Connection from localhost
Nov 20 14:03:59 shadow localhost[19504]: Connection reset by peer
Nov 20 14:03:59 shadow localhost[19504]: Exiting
Nov 20 14:04:00 shadow pqexpire[21180]: assertion "rp->prev != OFF_NONE" failed:
 file "pq.c", line 696

The assertion failure in pqexpire points to a corrupt queue as being the
most likely problem.  To remake the queue, do the following:

<login as 'ldm'>

cd ~ldm
ldmadmin stop          <- to make sure nothing is running
ldmadmin delqueue
ldmadmin mkqueue
ldmadmin start

In taking a closer look at the log file you sent in, I see that
you are using a very old version of the LDM:

rpc.ldmd[23502]: Starting Up (built: Jan 30 1996 10:54:55)

I strongly suggest that you upgrade to a current version of the LDM:

<login as 'ldm'>
cd ~ldm
ftp ftp.unidata.ucar.edu
  <user> anonymous
  <pass> your_full_email_address
  cd pub/ldm5
  binary
  get ldm-5.2.2.tar.Z
  quit

zcat ldm-5.2.2.tar.Z | tar xvf -
cd ldm-5.2.2/src
./configure
make
make install
sudo make install_setuids         <- assumes your system as sudo installed
                                     If you don't, 'root' will need to
                                     run 'make install_setuids'

cd ../bin
<edit ldmadmin and make sure that  the line:

$hostname = "@HOSTNAME@";

is modified so that '@HOSTNAME@' is replaced by the fully qualified hostname
of your machine running the LDM.

Also check to make sure that the path to Perl is correct and that
you have enough room for the default 400 MB queue.

<continuing>

cd ~
rm runtime
ln -s ldm-5.2.2 runtime

cd etc
<edit ldmd.conf and:

comment out the line that starts pqexpire (it is not needed in newer LDMs)

add the line:

exec    "rtstats -h rtstats.unidata.ucar.edu"

cd ~
ldmadmin delqueue
ldmadmin mkqueue            < will take some time
ldmadmin start

You can probably get away with not upgrading your LDM, but since you are
going to have to do this at some point anyway, why not do it now?

Tom Yoksas

>From address@hidden Wed Nov 20 11:17:04 2002
>Subject: Re: 20021120: LDM startup error message 

On Wed, 20 Nov 2002, Unidata Support wrote:

> Ken,

> The assertion failure in pqexpire points to a corrupt queue as being the
> most likely problem.  To remake the queue, do the following:

Yes, that was it a corrupt queue!

I never had that happen to me before so
I hadn't thought of that.

Thanks much for your help!!

*******************************************************************************
Ken Scheeringa                       Indiana Climate Page
State Climatologist                  http://shadow.agry.purdue.edu
Agronomy Dept                          
Purdue University                      featuring climate data archives:
e-mail: address@hidden                  daily coop stations :     1994+
fax: 765.496.2926                        hourly airport data : Jul 1996+
phone: 765.494.8105                      30-min autostation  :     1999+
                                       updated daily
                                         Also monthly/daily normals 
*******************************************************************************