[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030920: mesodata load average (cont.)



>From:  Gerry Creager N5JXS <address@hidden>
>Organization:  Texas A&M University -- AATLT
>Keywords:  200309201339.h8KDdPk1012811 LDM scour load average

Hi Gerry,

>Thanks for the concise notes.  That helps me a lot to get a better 
>understanding.
>
>One thing I noted this morning was 8 instances of the pruning script!

OK, this tells me a lot.  Apparently, mesodata is not able to complete
a prune of the tree before another invocation starts.  I had set this
up to run every half hour.  I see that you changed this to run every
8 hours, but left the number of files to keep at 288.

I think we are zeroing in on the area of the problem.  The 'df -sk *'
invocation I ran yesterday (and included partial output) never
did complete.  It was like it got stuck somewhere.  This sounds like
the pruning script (a find invocation that runs rm) is hitting
the same kind of wall that 'df' did.  This would seem to indicate
that there is some sort of a problem with your hard disk.  I took
a quick look at /var/log/messages.1, and see that mesodata is losing
ntpd time synchronization regularly.  Ordinarily, I wouldn't think
much of this, but it may some of the time swings approach 10 seconds.
Is it possible that this can throw 'find ... -mtime' invocations
into a tailspin?  I don't see any glaring messages about disk faults,
so the problem is something more subtle.

>I'll clean out the ddplus side and see if that helps.  I need to move 
>the BINEX stuff off-site, so I'lltry to get that going real soon... say, 
>Monday.  I could see that causing a scour problem.

Even though scour is not setup to work on ~ldm/data/ddplus, it can't
hurt to clean up the disk.  By the way, I see that ~ldm/data/ddplus still
seems to be growing:

$ cd data
$ du -sk *
19660   AR
2141676 ARCHIVES
80408   binex
408     combhourly_pwv
0       cronlog
34408916        ddplus
4       decoded
124176  difax
4       fcst
4       forecasts

Yesterday, the size from the 'du -sk *' was:

34389440        ddplus

I also note that the 'du' invocation takes a _long_ time to get past
the forecasts directory.  Yesterday, I let this run for over 20 minutes
before finally killing it.   Today, it ran a little faster, but was
still very slow.  I see that the reason it took so long is the
~ldm/data/gempak directory has 5 GB of stuff in it:

5551652 gempak

Perhaps the entire problem you are seeing is related to GEMPAK scouring
or lack thereof?  This might be rectified by tidying up your GEMPAK
setup a bit, but I can't say for sure.

>Hope you have a good weekend.  I'm off to more Mr. Mom duties while my 
>wife catches babies at Ft Hood this weekend!

Talk to you later...

Tom

>From: Gerry Creager N5JXS <address@hidden>
>Date: Sun, 21 Sep 2003 12:50:15 -0500

>Howdy!
>
>I went in and got scour out of the nexrad business this morning, too. 
>That after I'd e-mailed you last night.  I think this was the problem. 
>I believe that when scour hit the nexrad area it choked and slowed down 
>the whole process.  If you look at scour.conf now, it's longer but 
>seems, so far, to work better.
>
>I'll move the prune script back to half-hour runs and we'll see what 
>happens.
>
>I've seen a lot of NFS losses, but I'd missed the ntp losses.  I also 
>resync time using 'rdate' periodically to our Stratum 1 server.  I'm 
>thinking I'm gonna set up a GPS-referenced time server on campus (as to 
>why we don' thave one, it's a long and strange story...).  I can achieve 
>60 ns accuracies with a $100 GPS receiver and a PC...
>
>So I'll ckean up ddplus and keep scour running on it afterward.  That 
>should drop things to a manageable level.  And move the binex off. 
>That's from my suominet site and I really don't want to blow that data 
>away.
>
>Gosh, I really didn't expect you to hit it this weekend!  Thanks!
>
>Gerry
>-- 
>Gerry Creager -- address@hidden
>Network Engineering -- AATLT, Texas A&M University     
>Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
>Page: 979.228.0173
>Office: 903A Eller Bldg, TAMU, College Station, TX 77843