[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[McIDAS #KDN-271049]: /home/data/mcidas/images isn't scouring, filling disk



Hi Gilbert,

OK, I am just about done tweaking the setup on weather3...

re: FILEing of NEXRAD Level III products on weather

> Well the CPU load is incorporated into the overall load average, and
> that's what's critical.

I am suspicious that the negative effects you were seeing (e.g., high load
averages) may have been caused by having too many processing actions for
too many feeds in your ~ldm/etc/pqact.gempak file.  The reason I say that is
the following observation on weather3:

1) the list of feeds for which there are actions in pqact.gempak is:

CMC|CONDUIT|FNEXRAD|FSL2|GPS|NNEXRAD|NGRID|NIMAGE|NLDN|NOGAPS|PCWS|UNIDATA|WSI

2) the number of actions in pqact.gempak is:

/home/ldm/etc% grep -v ^# pqact.gempak | grep -v ^" " | grep -v ^$ | wc -l
487

3) the Data Volume Summary page for weather3.admin.niu.edu is as follows:

http://www.unidata.ucar.edu/cgi-bin/rtstats/rtstats_summary_volume?weather3.admin.niu.edu

 Data Volume Summary for weather3.admin.niu.edu

Maximum hourly volume    623.716 M bytes/hour
Average hourly volume    411.020 M bytes/hour

Average products per hour      49993 prods/hour

Feed                           Average             Maximum     Products
                     (M byte/hour)            (M byte/hour)   number/hour
HDS                     191.371    [ 46.560%]      376.626    18196.435
NEXRAD2                  79.296    [ 19.293%]      129.694     4481.261
FNEXRAD                  69.901    [ 17.007%]       88.611       70.674
NNEXRAD                  24.180    [  5.883%]       29.013     2054.261
UNIWISC                  20.784    [  5.057%]       32.314       23.826
IDS|DDPLUS               17.690    [  4.304%]       21.914    25127.891
DIFAX                     5.543    [  1.349%]       22.425        6.957
FSL2                      2.026    [  0.493%]        2.156       21.848
NLDN                      0.229    [  0.056%]        0.577        9.696

   This listing shows that there are about 49500 products each hour that are
   checked for processing by the pqact that is handling the pqact.gempak 
actions.
   Before I changed the list of feeds that would be processed by the pqact that
   is responsible for pqact.gempak actions, it would have to scan ALL products
   in ALL feeds each hour, or almost 50000 products on average.

This means that that pqact has to do 49500 * 487 = 24106500 comparisons
each hour, and it acts on some fraction of these.  NOTE that pqact does not
stop working its way through a pattern/action file when a match is found, it
continues looking for additional matches.

The amount of processing that this single pqact would have to do would lead
me to believe the following:

- it would likely fall behind in its processing if it was tasked with FILEing
  all NEXRAD Level III products

- it should consume a lot of CPU

Now, splitting the actions into more pqact.conf files will help keep any one
pqact from falling behind in the processing it is attempting to do.  It should
not, however, decrease the overall CPU use, in fact, it should increase it
over a shorter time interval.

So what's my point?

- Chiz added the ability to generate multiple pqact.conf files for GEMPAK
  processing based on his observation that if one leaves all processing
  in one pqact.conf file, then one might see the processing fall behind enough
  so that products will not get processed out of the LDM queue before they are
  overwritten by newly received ones.
  
- it may be the case that moving the NIMAGE processing to a pqact that is not
  already overloaded would result in weather3's being able to process the
  data without the very high load averages you experienced

re: I just logged into weather as 'mcidas' and:
 - pointed at weather2 for RTNEXRAD data:
 - removed the ADDE definitions for the RTNEXRAD dataset from the server
   mapping table, $MCDATA/RESOLV.SRV
 
> OK, great. Thanks!

No worries.

re: weather3 is either a dual 3 Ghz machine or a single with hyper threading
> It's the latter, so is weather2. They're identical.

OK.

re: weather, on the other hand has a single 3 Ghz processor.

> Yep.

OK.

re: Given the hardware I see, I would think that weather would
struggle more than the other two machines

> Yes, and...

re: One of the biggest loads on any machine is X Windows -- it is a HUGE memory
user.

> Unfortunately, for WXP I have to use it.

Hmm... Can't you use a virtual frame buffer for generation of WXP products
for your web site?

re: processing of NEXRAD Level II data

> Correct, but the limited amount keeps the load from getting too high. I
> used to have all LEVEL2 data on weather3, and when I get the new machines,
> I will do so again.

OK.

re: I finished adjusting processing being done by McIDAS pqact.conf actions
to remove duplication of those being done for GEMPAK

> Good!

Yes, this will save disk AND CPU.
re: I propose that we investigate the high load averages seen when processing
NIMAGE data.

> I did a "yum -y install *iostat*" but didn't find any packages. Any clues?

Yup.  I installed the package containing iostat as follows:

yum install sysstat-7.0.4-3.fc7

This installed /usr/bin/iostat and /usr/bin/sar.  I then copied over the script
we use for system monitoring, ~ldm/util/uptime.tcl and adjusted some entries
to work on your system (like the PATH defined in uptime.tcl).  I then added
running of the script once-per-minute from cron:

#
# Monitor system performance
#
* * * * * util/uptime.tcl logs/weather3.uptime
0 0 1 * * bin/newlog logs/weather3.uptime 12

The items listed are:

20070827.2121   0.51  1.00  1.27   10  18  28   7481   39M    6M  38.00 18.50  
0.50 43.00
    ^    ^        ^     ^     ^    ^   ^   ^    ^      ^      ^   ^     ^      
^    ^
    |    |        |     |     |    |   |   |    |      |      |   |     |      
|    |_ %idle 
    |    |        |     |     |    |   |   |    |      |      |   |     |      
|_ I/O wait   
    |    |        |     |     |    |   |   |    |      |      |   |     |_ % 
system   
    |    |        |     |     |    |   |   |    |      |      |   |_ %user
    |    |        |     |     |    |   |   |    |      |      |_ swap in use
    |    |        |     |     |    |   |   |    |      |_ free memory
    |    |        |     |     |    |   |   |    |_ age of oldest product in LDM 
queue [s]
    |    |        |     |     |    |   |   |_ total # connections
    |    |        |     |     |    |   |_ # upstream connections
    |    |        |     |     |    |_ # downstream connections
    |    |        |     |     |_ 15 minute load average
    |    |        |     |_ 5 minute load average
    |    |        |_ 1 minute load average
    |    |_ time [UTC]
    |_ date [ccyymmdd]

The output from this file will give us a time history of the performance on
weather3.

I have adjusted things on the McIDAS ADDE side to use GEMPAK-processed images
where needed.  I believe that the redundancy in processing/disk use between
GEMPAK and McIDAS is now gone.

> Great.

re: I think that weather3 should easily be able to handle the processing
load you have on it AND file the NIMAGE products.  The fact that it can't
leads me to suspect that something is wrong somewhere.  The thing to do is
find out where the problem(s) is(are) and fix it(them).

> OK.

I will turn on NIMAGE processing in the combined McIDAS pqact.conf file,
~ldm/etc/pqact.conf_mcidas to see what happens on weather3.  I will write
the NIMAGE data into the directory structure needed for GEMPAK, but the
action now in pqact.gempak will be commented out.

re: I can't see how what you have right now in weather3 is not able to keep up
with what you are trying to do.

> Hmm. OK.

re: take care with overclocking

> I do it now, no problems so far, but I only go 5% over.

Yes, but you ran into a heat problem on weather...

> Gotta run...

More as the NIMAGE testing proceeds.

Cheers,

Tom
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: KDN-271049
Department: Support McIDAS
Priority: Normal
Status: Closed