[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #USJ-914724]: LDM on Snow Leopard



Dave,

> Yeah, I might have been the one who originally encountered this
> problem and asked you'all for help with it.

You are (now that you reminded me -- it's truly been a while).

> Apple's lack of response
> has been disappointing.

To say the least.

> Maybe they'll be more responsive during this
> period immediately following Snow Leopard's release, while they try to
> get the bugs out of it. (Either that, or they'll be more overwhelmed
> than usual with bug reports.)

From your fingers to their eyes.  :-)

I suspect that there just aren't that many programs running on Mac OS X 10.4 
(or higher) systems that use fcntl() file-locking and mmap() memory-mapping as 
much as the LDM.

> Unidata's "known bug" entry for this problem notes that there is no
> workaround. That's true in a sense, but if it were strictly true then
> I'd never be able to stop the LDM at all, even when processes hang
> (which eventually some of them do on a semi-regular basis). 

I think a hung downstream LDM will, nevertheless, terminate upon reception of a 
SIGTERM, which is what the top-level LDM server sends all child processes when 
it's told to terminate.  I could be wrong, however.  One thing I have noticed 
is that attaching to the hung process with gdb(1) and then exiting gdb(1) will 
free the process from its hung state.  I'm at a loss to understand how that 
happens without intervention by the operating system.

> I've
> written scripts that try to deal with the inability to run "ldmadmin
> stop" to stop the LDM; maybe you could comment on whether or not I've
> got the bases covered acceptably:
> 
> (1) Run "ldmadmin stop", redirecting the output to a file.
> 
> (2) Check that file for the word "isn't" (as in "the LDM isnt
> running", or something like that).
> 
> (3a) If the LDM isn't running, check to see if there's a ldmd.pid
> file.
> -- If there's no ldmd.pid file, run pqcat & pqcheck, then run
> "ldmadmin clean".
> 
> (3b) If the LDM is running, wait 30 seconds to give it a chance to
> shut down. (This is typically doomed, at least for some rpc processes.)

Maybe you should give it a minute.

> (4) Get a list of pids for rpc processes owned by the ldm account.
> 
> (5) If this list isn't empty:
> --  Run "kill -9" on them all.
> --  Run "ldmadmin clean".
> --  Run pqcat and pqcheck.  If this doesn't produce a "tallied
> consistent" message, run "ldmadmin delqueue" and "ldmadmin mkqueue".
> 
> (6) Run "ldmadmin start".

This procedure should result in a restarted LDM.  I just wish it wasn't 
necessary.

> (Script attached.)
> 
> -- Dave

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: USJ-914724
Department: Support LDM
Priority: Normal
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.