[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #HCO-891524]: Unidata Performance Question



Mike,

Yes, if you have to kill LDM processes like that then you should execute the 
command "ldmadmin clean" *and* re-create the product-queue because it's likely 
been corrupted.

> Steve,  Thanks for looking at this head scratcher.
> 
> I suspect I corrupted to product queue when I killed the pqinsert that had 
> been running for over 24 hours.  When I did a top I had the single process 
> that had accumulated over 24 hours of cpu time.  I tried sending a TERM 
> signal to the process.  I waited a couple of minutes and when that didnât 
> take I issued a KILL to shut it down.  I hoped that after that the other 
> inserts would start running but it never happened.  I then tried to shut down 
> the ldmd through ldmadmin but it never terminated.  I started to then 
> systematically kill all the pqinserts still waiting.  Once I got them all 
> killed ldmd shut down successfully.  After that I tried to run the pqcheck 
> and thatâs when I had to wait over 40 minutes for a check that never 
> finished.  In retrospect reading the instructions again I think I should have 
> run the clean option through ldmadmin.
> 
> The ldm system is running via a start command.  While the perl script 
> executes is a good question.  There is nothing to prevent the ldmadmin start 
> from occurring while the scripts are running.  The perl scripts run on a cron 
> to pull data from external gps receivers.  I guess that is something to 
> consider.
> 
> During this instance the ldm was up and running before the cron jobs were 
> started and this product queue had been populating for over a week before we 
> hit this snag.
> 
> There shouldnât be any problems with power on the servers.  Theyâre all 
> UPS protected, with a generator as a secondary electrical to regular 
> utilities.
> 
> This seemed really strange that the pqinsert got stuck on a single file 
> trying to insert it.  As I say I donât have any good theories on what may 
> have occurred, other than to say I hope itâs a one time cosmic pixie dust 
> anomaly that never happens again.
> 
> I still suspect it might be a file access error, that the pqinsert was called 
> before the file was fully written out.  Iâm looking at building a more 
> robust way to see that the system is done with the file before it tries to 
> call pqinsert.  Iâm looking at deeper system level calls to see that the OS 
> is done writing out the file than simply monitoring the mod time of the file.
> 
> Again Thanks for lending your expertise.  At least I know Iâm not missing 
> something completely obvious.
> 
> -Mike

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: HCO-891524
Department: Support LDM
Priority: Normal
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.