[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
20040614: Quick update on weather3.admin.niu.edu... (cont.)

Subject: 20040614: Quick update on weather3.admin.niu.edu... (cont.)
Date: Tue, 15 Jun 2004 09:15:34 -0600
>From:  Gilbert Sebenste <address@hidden>
>Organization:  NIU
>Keywords:  200406150354.i5F3sOtK011267 LDM Fedora Core Linux

Hi Gilbert,

>Weather3 is back up and running, and feeding weather2 at this time.
>
>At this point, I think I have a good idea of what is happening, as well as 
>NOT knowing what is happening.

I just logged onto weather3 and noticed that you are running the
Fedora Core 1 2.4.24-2190.nptlsmp kernel:

uname -a
Linux weather3.admin.niu.edu 2.4.22-1.2190.nptlsmp #1 SMP Wed May 26 13:46:20 
EDT 2004 i686 i686 i386 GNU/Linux

and that the modification date on the kernel files in /boot are May 26
(which is in agreement with the uname listing):

ls -alt /boot/vmlinux*.2190*
lrwxrwxrwx  1 root root 44 Jun 10 16:20 /boot/vmlinux-2.4.22-1.2190.nptlsmp -> 
../lib/modules/2.4.22-1.2190.nptlsmp/vmlinux*
lrwxrwxrwx  1 root root 41 Jun 10 16:19 /boot/vmlinux-2.4.22-1.2190.nptl -> 
../lib/modules/2.4.22-1.2190.nptl/vmlinux*

Since we are intimately involved with multiple machines (at the UPC, in
Costa Rica, and at Texas A&M) running LDM under the Fedora Core 1
2.4.22-1.2188.nptlsmp kernel, and since none of these machines are
experiencing any problems, I have to wonder if your problem is somehow
related to the *.2190.nptlsmp kernel.

For reference, I personally have setup 4 dual processor machines (three
Athlon MP based, one Xeon based) with FC1 *.2188.nptlsmp and LDM queues
of 1 GB or larger (one has a 1 GB queue, one has a 2 GB queue, and two
have 4 GB queues) and have experienced no problems.  Two of these
machines are ingesting and processing everyting available in the IDD
including _all_ NEXRAD Level II data and _all_ CONDUIT data.  If load
stress could cause buss errors, I should have seen them on these
machines, but I havn't.  If large LDM queues could cause buss errors,
I should have see problems on all of these systems.

When did you upgrade to the *.2190.nptlsmp kernel?

Tom

>Whenever I set the LDM queue to 400 MB (by default), it doesn't like it. 
>Set it under 300 MB...and it is happy.
>
>This is happening on weather2 and weather3, even though they are 
>identical but separate machines. With 1.5 GB of RAM and 250 GB disk 
>space...hmmm. Weird. Yet, this is not happening on weather, with 80 GB 
>disk space and 2 GB RAM. Weather2 and Weather3 have IDE drives; Weather 
>has SCSI with a RAID.
>
>You tell me what's wrong. I dunno. In any case, with the lower queue,
>weather3 seems to be stable. Let me give it one more day to make sure.
>Otherwise, weather2 is humming along fine. Keep feeding from that.
>
>*******************************************************************************
>Gilbert Sebenste                                                     ********
>(My opinions only!)                                                  ******
>Staff Meteorologist, Northern Illinois University                      ****
>E-mail: address@hidden                                               ***
>web: http://weather.admin.niu.edu                                      **
>Work phone: 815-753-5492                                                *
>*******************************************************************************
>
>From: "David B. Bukowski" <address@hidden>
>Date: Mon, 14 Jun 2004 23:45:01 -0500 (CDT)
>To: Gilbert Sebenste <address@hidden>
>cc: address@hidden
>Subject: Re: Quick update on weather3.admin.niu.edu...
>
>well first off treat your RAID as a single drive since I think thats what
>you told me last time we talked.  So in otherwords its just another SCSI
>drive.  the IDE drives could be where the bottleneck is, since they are
>slower than your SCSI more than likely.  Also your IDE is probably running
>from the mainboard instead of a seperate IDE controller card.  Since i'm
>not an expert on LDM, i'm just making a wild guess that your getting data
>to your drives faster than they can handle and the pipe to them can't
>handle it anymore and then start timing out.  just a wild random guess.,
>back to doing slideshow production now before bed :)
>-dave
>
>-------------------------------------------------------------------------------
>David B. Bukowski      |email (work):          address@hidden
>Network Analyst III    |email (personal):      address@hidden
>College of Dupage      |webpage:       http://www.cshschess.org/davebb/        
>Glen Ellyn, Illinois   |pager:                 (708) 241-7655 
>http://www.cod.edu/    |work phone:            (630) 942-2591
>-------------------------------------------------------------------------------
>
>From: Gerry Creager N5JXS <address@hidden>
>Date: Tue, 15 Jun 2004 05:38:49 -0500
>Organization: Texas A&M University -- AATLT
>
>First thought... and before coffee, too... is that you're writing the 
>queue to the same disk as your data.  I've config'd all my machines to 
>have a system partition (60 GB on up, depending on prices) and a data 
>partition for LDM and gempak data.  I write the queue to system space 
>and the data and products to the data partition.
>
>Gerry
>
>-- 
>Gerry Creager -- address@hidden
>Texas Mesonet -- AATLT, Texas A&M University   
>Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
>Page: 979.228.0173
>Office: 903A Eller Bldg, TAMU, College Station, TX 77843
>
--
+-----------------------------------------------------------------------------+
* Tom Yoksas                                             UCAR Unidata Program *
* (303) 497-8642 (last resort)                                  P.O. Box 3000 *
* address@hidden                                   Boulder, CO 80307 *
* Unidata WWW Service                             http://www.unidata.ucar.edu/*
+-----------------------------------------------------------------------------+
Prev by Date: LDM Failover Configuration
Next by Date: 20040615: Quick update on weather3.admin.niu.edu... (cont.)
Previous by thread: LDM Failover Configuration
Next by thread: 20040615: Quick update on weather3.admin.niu.edu... (cont.)
Index(es):
- Date
- Thread