>From: Mark Tucker <address@hidden> >Organization: Lyndon State College >Keywords: 200003241538.IAA20061 McIDAS-X shared memory ipcs ipcrm zombie Mark, >I've had a problem with ingebin.k running as a zombie process on our ldm >server. The XCD_START.LOG file is filled with the following messages: > >Starting HRS at 00083.180403 >ingebin.k: Cannot make positive UC: could not create 384300-byte shared >memory segment >Starting HRS at 00083.180405 This is telling us that when 'xcd_run' tries to start ingebin.k (from the entry in pqact.conf for HRS data), there is not enough shared memory in the system for the process to start. >Restarting the ldm does not seem to clean this up. The started about a >week ago and we are currently not processing any model data for McIdas. The fact that it just started happening is telling us that the problem is not in the amount of shared memory available on your system. Most likely, you have had a number of processes that use shared memory die without properly cleaning up after themselves. The best way to check for this is to do the following as 'root': ipcs This will give you a listing of all shared memory segments in use on your machine. I am willing to bet that this will come back as a very long list. The job you will have when you do find a long list of shared memory segments in use is figuring out which ones are active and which ones can be removed. Since McIDAS processes are big users of shared memory, it is sensible to first check users of McIDAS. This will include all accounts that run McIDAS and ones that run McIDAS background processing (like the 'ldm'). I would do the following (again, as 'root'): icps This listing should tell you the users that have shared memory segments. I would take each one of these users and then find out what they are running. This will tell you if they should have shared memory segments in use. When you find all of the shared memory segments that can be released, you would run 'ipcrm' invocations. Here is an example. Suppose that I got the following ouput from an invocation of 'ipcs': ipcs IPC status from <running system> as of Fri Mar 24 13:43:47 MST 2000 T ID KEY MODE OWNER GROUP Message Queues: Shared Memory: m 44206 0x280267 --rw-r--r-- root root m 1652807 0 --rw------- mcidas ustaff m 1627008 0 --rw------- mcidas ustaff Semaphores: s 65536 0x280269 --ra-ra-ra- root root Further suppose that I check to see if the user 'mcidas' is, in fact, running anything. If it isn't running an instance of McIDAS-X, then I know that the shared memory segments whose ids are 1652807 and 1627008 can be released back to the system. I would run: ipcrm -m 1652807 ipcrm -m 1627008 or ipcrm -m 1652807 -m 1627008 After you release all of the "zombie" shared memory segments, processes needing shared memory will be able to run correctly. >I >will be rebooting the server shortly but I'd like to know what may be >causing this so that I can prevent it in the future. Rebooting will clear out all of the shared memory segments, and in your case may not be a bad idea. In general, however, one can use 'ipcs' to find "zombie" shared memory segments and 'ipcrm' to release them back to the system. As far as what may be causing the over use of shared memory, I can tell you that GEMPAK uses shared Message Queeues, and sometimes GEMPAK users need to run 'gpend' to release those resources. I would be interested to find out which processes have chewed up all of the shared memory on your machine: GEMPAK or McIDAS-X. >Thanks. You are welcome. Tom
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.