[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #JGZ-326819]: LDM - LDM is killing the system



Hi Angel,

Long time no hear!

> Institution: University of Miami
> Package Version: ldm 6.4.1
> Operating System: SuSE Linux 9.3 (x86-64)
> Hardware Information: 4 processor dual-core
> Inquiry: Problem #1: scour never completes
> 
> Problem #2: when the LDM runs the machine is very unresponsive. I know this 
> is kinda vague 
> but that's as much as I know. They are saving a  pretty large subset of 
> available data and
> writing to a RAIDed disk on a 3ware  card.. Any hints where to look first?

Our experience with "home built" RAID systems (meaning a RAID built by adding a 
RAID
card and attaching hard disks) on Linux is NOT positive!  We have tried 
virtually every
file system available on the RAID (except GFS), and have been dissapointed with 
all.
We have been told that RAID performance when using 3Ware cards is better, but my
experience working with Gerry Creager (address@hidden) on his 3Ware-based RAID
setup is not stellar.  Sources in NCAR claim that they get very good RAID 
performance
with external boxes that appear like SCSI devices to the system.

The biggest performance problem occurs when one puts the LDM queue on the RAID
AND then write LOTs of files to it.  In a test on a Fedora Core 1 machine with 
a Promise TX2000 RAID
card, I found that putting a 2 GB LDM queue on the RAID would result in receipt 
time latencies
that rapidly ramped up to 1 hour.  When the queue was moved to a "local", ext3 
filesystem
the latencies dropped to fractions of a second.  Gerry and I also noticed that 
the scouring
on his RAID was very sluggish, so much so that I investigated writing new scour 
routines
in other scripting languages to see if I could minimize the problems.  I was 
marginally
successful in implementing scouring in Tcl, but not so much so that I can 
positively
say that this is a "solution".  By the way, at the time of our collaborative 
testing Gerry's
machine was running Fedora Core 2 and is now running CentOS Linux.  It is a 
dual, hyperthreaded
Xeon (32-bit) machine with 4 GB of RAM.  The 2 TB RAID is built from multiple 
300 GB Maxtor IDE drives.

As a starting point, I recommend immediately moving your LDM queue off of the 
RAID _if_ it
is currently on it, and see if there is a noticable improvement.

By the way, Steve says hi and asks how things are going in Miami!

Cheers,

Tom
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: JGZ-326819
Department: Support LDM
Priority: Normal
Status: Closed