[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: computer freezes



I concur, NFS is notoriously atrocious on Linux.  It very well could be the
source of your problems.

-----Original Message-----
From: address@hidden
To: Marcus Christie
Cc: address@hidden
Sent: 5/6/2005 10:31 AM
Subject: Re: computer freezes

Gabe,

On Thu, 5 May 2005, Marcus Christie wrote:

> Gabe Langbauer wrote:
>> Hello All,
>>
>> I'm running a Dell Xeon processor with RedHat enterprise linux v. 3
>> installed. This system is used only for retriving data from ldm
(6.1.0),
>> creating images with gempak (5.7.3) and displaying these images over
the
>> web via apache. (also 1 java script).  Recently, the computer has
been
>> locking up.  The symptoms are simply the computer becomes
unreachable.
>> Even locally, if I go to the computer, it does not respond.  I am
forced
>> then to cycle the power in order to get the system to once again
respond.
>> Checking the system log files hasn't shown anything (at least to me)
>> except that when this lockup occurs, the cron does not run (usually).
If
>> anyone has any ideas as to what may be occuring it would be GREATLY
>> appreciated.
>>
>> --Gabe Langbauer

In case you haven't fixed the above problem yet, I'll start down another

possible path for you...

We have a Dell server which is now our main decoder/file serving system 
that used to do what you've described "every-so-often" (maybe 
once/month???).  It doesn't do it now, however, and the bad news is, I 
don't know for sure what fixed it.  However, I do know that the LDM was 
not the problem since I extensively profiled it and could actually find
no 
cause for the freeze-up (including system resource problems or LDM 
problems).  However, the one thing I did notice, was that when I
rebooted 
another older machine running RH9 with which it shared mounted file 
systems (both ways), the reboot of the older system seemed to trigger an

imminent (say within 24 hours) freeze-up of the new system.  I can't 
explain this, I only know that there was a pretty strong correlation. 
After de-tangling these two systems (i.e. getting rid of all NFS mounts
of 
the old system onto the new), I don't believe I've seen this problem 
again.

So, if you still haven't figured this out, you might try creating some
"ps 
-eaf" and "free" (or other) logs from a script and check them after your

next freeze-up.  If you can't find any obvious abnormallities, analyze 
your system for NFS tangledness and try some detangling and see if that 
helps.

Of course, there could always be a hardware problem too (which is what I

thought our problems were initially)...  have you checked your system 
"/var/log/messages" file for errors?  You might also try running vendor 
hardware diagnostics on the system, although I've rarely found these to
be 
very useful.

                                    Art.


Arthur A. Person
Research Assistant, System Administrator
Penn State Department of Meteorology
email:  address@hidden, phone:  814-863-1563