[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20000324: ingebin.k: Cannot make positive UC from shared memory



>From: Mark Tucker <address@hidden>
>Organization: Lyndon State College
>Keywords: 200003241538.IAA20061 McIDAS-X shared memory ipcs ipcrm zombie

Mark,

>I've had a problem with ingebin.k running as a zombie process on our ldm
>server.  The XCD_START.LOG file is filled with the following messages:
>
>Starting HRS at 00083.180403
>ingebin.k: Cannot make positive UC: could not create 384300-byte shared
>memory segment
>Starting HRS at 00083.180405

This is telling us that when 'xcd_run' tries to start ingebin.k (from the
entry in pqact.conf for HRS data), there is not enough shared memory
in the system for the process to start.

>Restarting the ldm does not seem to clean this up.  The started about a
>week ago and we are currently not processing any model data for McIdas.

The fact that it just started happening is telling us that the problem is
not in the amount of shared memory available on your system.  Most likely,
you have had a number of processes that use shared memory die without
properly cleaning up after themselves.  The best way to check for this
is to do the following as 'root':

ipcs

This will give you a listing of all shared memory segments in use on
your machine.  I am willing to bet that this will come back as a very
long list.

The job you will have when you do find a long list of shared memory
segments in use is figuring out which ones are active and which ones
can be removed.  Since McIDAS processes are big users of shared
memory, it is sensible to first check users of McIDAS.  This will
include all accounts that run McIDAS and ones that run McIDAS background
processing (like the 'ldm').

I would do the following (again, as 'root'):

icps

This listing should tell you the users that have shared memory segments.
I would take each one of these users and then find out what they are
running.  This will tell you if they should have shared memory segments
in use.  When you find all of the shared memory segments that can be
released, you would run 'ipcrm' invocations.

Here is an example.  Suppose that I got the following ouput from an
invocation of 'ipcs':

ipcs
IPC status from <running system> as of Fri Mar 24 13:43:47 MST 2000
T         ID      KEY        MODE        OWNER    GROUP
Message Queues:
Shared Memory:
m      44206   0x280267   --rw-r--r--     root     root
m    1652807   0          --rw-------   mcidas   ustaff
m    1627008   0          --rw-------   mcidas   ustaff
Semaphores:
s      65536   0x280269   --ra-ra-ra-     root     root

Further suppose that I check to see if the user 'mcidas' is, in fact,
running anything.  If it isn't running an instance of McIDAS-X, then
I know that the shared memory segments whose ids are 1652807 and 
1627008 can be released back to the system.  I would run:

ipcrm -m 1652807
ipcrm -m 1627008

or

ipcrm -m 1652807 -m 1627008

After you release all of the "zombie" shared memory segments, processes
needing shared memory will be able to run correctly.

>I
>will be rebooting the server shortly but I'd like to know what may be
>causing this so that I can prevent it in the future.

Rebooting will clear out all of the shared memory segments, and in your
case may not be a bad idea.  In general, however, one can use 'ipcs'
to find "zombie" shared memory segments and 'ipcrm' to release them
back to the system.

As far as what may be causing the over use of shared memory, I can tell
you that GEMPAK uses shared Message Queeues, and sometimes GEMPAK
users need to run 'gpend' to release those resources.

I would be interested to find out which processes have chewed up all
of the shared memory on your machine: GEMPAK or McIDAS-X.

>Thanks.

You are welcome.

Tom