[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20011204: pqact dumping core on Solarisx86 at NSBF



>From: Robert Mullenax <address@hidden>
>Organization: NMSU/NTSB
>Keywords: 200112050520.fB55KLN14664 LDM pqact core

Robert,

>We are now havig a problem with our ldm dying on our
>Solaris Intel box (Solaris 8).

This is just _not_ your week! ;-)

>This time there are no entries
>in ldmd.log at all.  It just stops.

If pqact fails abnormally, then the lead rpc.ldmd should shut down the
LDM.  In this case, you should see some messages at the end of
ldmd.log, but they will not tell you why the LDM is being shut down.

>There is a core file from pqact.

Good.  If you built the LDM from source, you can use dbx to find out
what happened to pqact to make it die.  There are a few situations that
could cause this:

o a corrupt queue

o some classes of bad lines in pqact.conf (Chiz told me that he had run
  into a situation where an entry for GEMPAK had gotten so long that
  it was causing pqact go dump core each time it tried to execute the
  action)

o a bad pqact executable
 
>I made sure there are no users that have limits
>on CPU time.  I know there is not much to work on, but can you
>give me some ideas?  I have never had this happen before.

The first thing I would do is verify the entries in pqact.conf:

ldmadmin pqactcheck

Next, make sure that you do not have any exceedingly long lines in
pqact.conf.

If both of the above are OK, I would try deleting and remaking your
queue.  If that doesn't work, I would be highly suspicious of your
pqact executable.

Please let us know if your situation matches one of the above (and
which one), or if you continue to have problems.

Tom