[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030328: pqact causes unusually high CPU usage on LDM 6/Soalris Intel (cont.)



>From: Robert Mullenax <address@hidden>
>Organization: NMSU/NSBF
>Keywords: 200303250413.h2P4D9B2010509 LDM-6 pqact

>From address@hidden Tue Mar 25 15:29:55 2003

Thanks Tom,

I did have some initial strangeness with the queue..so I will check the
queue first.  I had to delete and remake the queue twice before I got
the LDM 6 to start getting data.  I forgot that you can check for queue
corruption.

It is also possible I have something funky in pqact..but it I am pretty
sure it is essentially the ones y'all send out as templates.

I will look at this tonight..

Thanks,
Robert

Robert,

re: us poking around on your system

>Yes Tom, I was hoping you guys could just look at it..as I am at the end
>of my rope.  

No problem from this end.

>You will need to ssh into the machine as telnet is disabled.

We always use ssh.

>Feel free to switch runtime link as needed.  It is pointing to ldm-5.2
>right now.

As a recap, I rebuilt LDM-6.0.2 (you finished the 'make install_setuids'
step as 'root), and have been running it and 5.2-binary off and on
over the afternoon.  What we see is consistent with your machine receiving
its data faster, and, so, having to do more work processing more products
in a shorter period of time.  The only way to really test this assumption,
however, is to turn on the reporting of realtime stats (I did this)
and run the machine for extended periods of time (like more than a 
day) using both LDM-6.0.2 and LDM-5.2.  I will be doing this over the
weekend, and I will touch base with you on Monday about the results.

>It's likely something stupid on my part...my for the life of me I can't
>find what it is.

We don't think so.  We did "tune up" your pqact.conf file a little bit.
You can see the changes by doing a diff between pqact.conf and
pqact.conf.pathological.  The changes weren't biggies, and the effect
on pqact should have been the same for 6.0.2 as for 5.2.

The other thing I noticed is the RES column of top for LDM processes
was significantly larger under 6.0.2 than 5.2.  The overall memory usage
as listed by top was the same, however.  Not sure what to make of this,
but it was interesting.

>This box has always performed slightly better than
>either of our two SPARC boxes..so it's really weird.

Here are some comments:

1) we notice that you are running dcgrib from scripts in ~ldm/scripts.
   Chiz says it would be much better to replace dcgrib invocations
   with dcgrib2

2) given the greater RES portion of the LDM programs is significantly
   larger for LDM-6 than LDM-5, it is possible that your machine is
   a little memory starved

3) it may be the case that you are getting products faster with LDM-6
   and so processes have to work harder to work through the data.
   We should be able to evaluate this from the real time stats.

re: make install_setuids

>I will do that here in a few minutes.

Was done, thanks.

re: fixup top

>Okay, I did the install_setuids.  I have been meaning to fix the top
>thing I keep forgetting, thanks for reminding me.

Great, thanks.

I will touch base on Monday with my findings.

Tom
Unidata WWW Service              http://my.unidata.ucar.edu/content/support    
  
>From address@hidden Fri Mar 28 16:59:39 2003

re: using dcgrib versus using dcgrib2

>Okay, but those are for data that is ftp'ed.  I will do that though.
>One reason I used dcgrib is that I noticed dcgrib2 was about twice
>as slow at decoding the AVN 1x1 grid files that I ftp.

re: maybe your machine is memory starved

>The machine is full of RAM. I can't add more.  I have been trying to get
>a new machine but to no avail.  

>I sincerely hope that you tell me that the machine is just being overworked
>because of the more efficient data feed..maybe I can finally get a new machine.

>Thanks,
>Robert