On Wed, 11 May 2005, Arthur A. Person wrote:
What recent changes have been made to the system? Am I correct in
assuming that the system worked fine recently until you made some change?
Was it just starting up the gempak scripts that caused this? Have there
been any system upgrades or packages or hardware added? Is it just the
one gempak script that causes a problem, and if so, if you run only
portions of that script, does the problem go away? I.e., what portion of
the script is causing the problem?
Okay, the whole story goes like this...we had an old sun system that ran
these scripts. When I started taking care of it in September, it was
obvious that this system was too old and decrepit to continue. Luckily, I
was able to convince the powers that be that a new system was needed. The
new system came in December, and we transfered all scripts and updated
GEMPAK and LDM to the new versions. The system worked fine from about
Christmas until maybe a month ago. Around this time I added a couple
scripts (model differences). When the crashes started occuring, my
immediate thought was an error with one of the new scripts. Therefore, I
disabled all of them, to no avail. I then looked and found a java script
that had stopped working and thought that was the problem. I disbaled it
and found no change in performance.
It appears to me that the crashes occur randomly. At different times of
the day and after very different uptimes. Sometimes we're up for a week,
sometimes (like yesterday) 3 crashes in a single day. Therefore, I
conclude that if it is a single script, it's one that runs at least
hourly. I've made a list of all of these scripts and am now disabling
them one by one to see if I get any results.
It may be unrelated, but I had a system (actually, I still have it) that
has a memory leak in some I/O driver that jams the system over time. In
such a case, you will see the "-/+ buffers/cache used" column of "free"
increase steadily with time until the system hangs, at least that's what
I'll look into this.