[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20040311: LDM I/O Slowness



>To: Unidata Support <address@hidden>
>From: "Patrick O'Reilly" <address@hidden>
>Subject: LDM I/O Slowness
>Organization: UCAR/Unidata
>Keywords: 200403111734.i2BHYSrV007612

Hi there,

I'm running the latest and greatest version of LDM (6.0.14) on a dual Xeon
3GHz, 1 Gb memory Dell.  It has been running great, that is, until I
upgraded from RH 9 to Fedora Core 1.  I did an upgrade rather than a fresh
install as I didn't want to have to build the machine from scratch again.
These problems didn't begin until about a day and a half after the upgrade,
so I am not convinced they are related, but am leaning towards that
conclusion.

My latencies are very low, as usual, but products aren't getting
decoded/filed for a long time, up to and over an hour.  There's nothing
unusual in the log file, I have always gotten some pbuf_flush errors, but
there was never degradation of performance.  I rebuilt the queue, as I
thought a corrupt queue could be the problem, no dice.  My top output shows
zeros on iowait, and the machine's processers aren't working very hard. The
log files are at:

http://thunder.storm.uni.edu

I read some support archive stuff regarding similar problems, but couldn't
find anything.  Any pointers?  If someone would like to get in to poke
around the machine, I can give a login.  I am at my wits end.  I have many
real-time images that are being created off this data stream and they aren't
getting made due to this problem, so a fix needs to happen soon.  I'm toying
with a fresh ldm install, but I'm not sure that would help.  I dread a fresh
OS install, as it would keep me busy for a week, this machine performs many
functions.

Thanks for your wisdom.

Patrick
_________________________________________
Patrick O'Reilly
Meteorological Decision Support Scientist
The STORM Project at UNI
address@hidden    319-273-3789
http://www.uni.edu/storm

"No trees were killed in the making of this e-mail...however,
 a large number of electrons were horribly inconvenienced."

- --
**************************************************************************** <
Unidata User Support                                    UCAR Unidata Program <
(303)497-8643                                                  P.O. Box 3000 <
address@hidden                                   Boulder, CO 80307 <
- ---------------------------------------------------------------------------- <
Unidata WWW Service              http://my.unidata.ucar.edu/content/support  <
- ---------------------------------------------------------------------------- <
NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publically available
through the web.  If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.

From address@hidden  Fri Mar 12 11:57:28 2004
Return-Path: <address@hidden>
Received: from gilda.unidata.ucar.edu (address@hidden [128.117.140.30])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id i2CIvSrV018288
        for <support-ldm>; Fri, 12 Mar 2004 11:57:28 -0700 (MST)
Message-Id: <address@hidden>
Organization: UCAR/Unidata
Keywords: 200403121857.i2CIvSrV018288
To: address@hidden
Subject: 20040311: UPDATE: LDM I/O Slowness
Date: Fri, 12 Mar 2004 11:57:27 -0700
From: Steve Emmerson <address@hidden>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on 
        laraine.unidata.ucar.edu
X-Spam-Level: 
X-Spam-Status: No, hits=-3.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham 
        version=2.63

>To: Unidata Support <address@hidden>
>From: "Patrick O'Reilly" <address@hidden>
>Subject: UPDATE: LDM I/O Slowness
>Organization: UCAR/Unidata
>Keywords: 200403112031.i2BKVArV029450

Well.....after delving deep into the machine's logs, I saw that the dual
Xeons weren't being treated correctly by the new kernel, as in the earlier
kernel, it treated them like 2 physical CPU's and 4 virtual CPU's.  I would
see this in "top" output.  I was getting error messages saying that "no
sibling found for CPU 0" and "no sibling found for CPU 1", which I think
meant that the hyperthreading wasn't working.  Anyway, I booted into the old
RH 9 kernel and after a few minutes, things were catching up. At this point,
things look good again, as products are being filed quickly.  For anyone's
info, the "good" kernel is RH9 2.4.20-smp and the "bad" kernel is the one
that came with Fedora Core 1 I just installed, 2.4.22-1.2174.nptlsmp.  I
hope this is the fix I was looking for.

Patrick
_________________________________________
Patrick O'Reilly
Meteorological Decision Support Scientist
The STORM Project at UNI
address@hidden    319-273-3789
http://www.uni.edu/storm

"No trees were killed in the making of this e-mail...however,
 a large number of electrons were horribly inconvenienced."

- --
**************************************************************************** <
Unidata User Support                                    UCAR Unidata Program <
(303)497-8643                                                  P.O. Box 3000 <
address@hidden                                   Boulder, CO 80307 <
- ---------------------------------------------------------------------------- <
Unidata WWW Service              http://my.unidata.ucar.edu/content/support  <
- ---------------------------------------------------------------------------- <
NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publically available
through the web.  If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.