Some things are coming back to me now.

Back when Sun was being idiotic about Solaris x86 in 2002, I switched
one LDM server over to RH 8, and immediately on the same box that ran
Solaris with an uptime of 300 days I started having random lockups like
this where you could not ssh or telnet into the machine.  In that case
all I could do was hard power it on/off.  This happened about once a
I never did figure out what it was that was causing it.  There was
nothing in the logs that showed anything.  The only thing I could
connect it to was that it always happened whne the box was under it's
heaviest load.
I switched the box back to Solaris a year later and the problem went
away, so it definitely was not hardware.

If the box is running an X-server, I also had problems with the Nvidia
binary drivers and Red Hat that caused the same problem, except this was
on workstations.  X would lockup and the box would become unresponsive
to telnet or ssh.  That problem was solved by switching to the Xfree86
NV driver.

You might also check that you have the latest patches for your NICs.  A
buggy NIC driver could do this.  I have a Solaris box that uses a
Broadcom driver (written by Broadcom).  Under heavy load the NIC driver
causes a hang or sometimes a kernel panic.  That problem was resolved by
using a Intel card instead with the built-in Solaris 10 driver.

I would also double check on NFS as well.


During these periods ssh is completely unable to connect to the
machine...I do log ps -eaf and free, although I think those have been
written over since this most recent crash.  The free command shows that
most of the memory is used, however according to some google searches
this is because at boot the kernel "takes" the memor and allocates it as didn't show me anything...I've checked all /var/log
files and nothing jumps out there either.

