[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20000809: performance, LDM, and ADDE remote server issues on cacimbo



>From: "Thomas L. Mote" <address@hidden>
>Organization: University of Georgia
>Keywords: 200007172022.e6HKMuT11816 UGA McIDAS-X ADDE NOAAPORT GINI imagery

Tom,

Both of your email messages from today are included in this note.

>From address@hidden Wed Aug  9 14:55:40 2000

Tom,

>I saw you were on my system this afternoon, and you probably noticed a
>few changes.

Yes.  I got on this morning and saw that the GEMPAK decoders,
especially dchrly, were sucking up way too much CPU.  It looks like you
solved that problem.

>After I installed all of those system patches last night,
>I changed the swap space from my old drive to my new hard drive this
>afternoon. That seems to have helped performance.

I have to believe that this helped ALOT!  Your new disk is
considerably faster than the old one, so this helps out swapping
performance.

>(I actually checked
>on the prices for new RAM for the SPARC 20, and found the prices were
>outrageous. I doubt we'll be adding memory. I would probably go to a
>new system first!)

I thought that would be the case.  In an earlier reply (which I
abandoned) I make a comment about if you were going to spend money, you
might do well to consider purchasing a well equipped PC.  We are using
dual processor Pentium IIIs with 1 GB of RAM and 36 GB of high speed
SCSI 2 disk (IBM 10000 rpm drives) for ingesting, decoding, and ADDE
serving ALL data in the IDD and CONDUIT and all NIDS and NOWRAD data.
That machine is a dual 550 Mhz model, and it hums along.  The only
problem is that it is somewhat I/O bound.  We will be switching from
software to hardware RAID in the next couple of weeks, so we will see
if that problem goes away.  (BTW, this box has 72 GB of disk, not 36).
The cost for such a system is approx. $6500.

>I also found we were having some problems writing to the log files and
>to the gempak data driectories.

I saw that.  I figured it had something to do with the group change for
the user 'ldm' sometime in January.

>I'm not sure why, since they were all
>group writable directories and all of the directories belonged to the
>"apps" group that is shared by gempak, mcidas and ldm. Nevertheless, I
>made sure all of the directories were owned by ldm, and this solved the
>problem.

I believe that what you accomplished was changing the directories to
have the current group of the 'ldm'.  This got rid of the information
on the old group.  Now, I can change ROUTE.SYS back to being 664 and
things should keep working.  I will test that right now...

Nope.  Back to the drawing board.  OK, here is what I did:

<login as 'ldm'>
cd /data/mcidasd
cp ROUTE.SYS ROUTE.SYS.LDM
cp SYSKEY.TAB SYSKEY.TAB.LDM
cp SCHEMA SCHEMA.LDM
rm ROUTE.SYS;mv ROUTE.SYS.LDM ROUTE.SYS
rm SYSKEY.TAB;mv SYSKEY.TAB.LDM SYSKEY.TAB
rm SCHEMA;mv SCHEMA.LDM SCHEMA

This left new copies of ROUTE.SYS, SYSKEY.TAB, and SCHEMA in
/data/mcidasd.  The new copies are _definitely_ owned by 'ldm' in the
group 'apps'.  I verified that I can now modify these files as
'mcidas', so all is as it shouldb.  This really _must_ have been caused
by an 'ldm' group change back in January.

Hmm...  There is a strange file appearing in /data/mcidasd:

-rw-rw-r--   1 ldm      apps           0 Aug  9 19:12 #

This probably has something to do with a pqact.conf entry.  The McIDAS
NLDN entry had a trailing '#' in it:

# Decode transmitted, binary form as McIDAS MD file
NLDN    
^([0-9][0-9][0-9]|[0-9][0-9])([0-3][0-9][0-9])([0-2][0-9])([0-5][0-9])([0-5][0-9])
        PIPE
        -close /unidata/home/mcidas/ldm-mcidas/bin/nldn2md -v -d /data/mcidasd 
70 NLDN DIALPROD=LD \1\2 \3\400 DEV=CNN #

I removed the trailing '#' in pqact.conf, deleted /data/mcidasd/#, and
sent a HUP to pqact.  We'll see if this was, in fact, the problem...
Yup, this was apparently the cause of the weird file.

>I think that may have been the reason so many decoders were
>hanging around and chewing up CPU cycles and memory. The performance
>has been vastly better since then.

I agree.  The system is now quite usable.

>I downloaded the latest patch level for GEMPAK and installed it.

We saw you in the process of doing that.  Excellent.

>I also
>downloaded and built ldm 5.1.1. I tried throwing the runtime switch and
>starting it, but the "ldmadmin start" hung. I don't have time this
>afternoon to figure out why, so I went back to ldm 5.0.8. I'll have to
>create a new log file and try it again.

OK.

>If the performance stays this good (loads around 1 or so), I may
>actually try sending the GINI data across the LDM from the NOAAport
>system, as I mentioned before. That can wait for another day.

OK.  Just make sure that you have enough disk space.  As I mentioned
earlier (at least I think that I did), ADDE access to the GINI data
works right now in uncompressed mode.  Also, the McIDAS GRID files are
pretty big, so they will consume a lot of your space.  OK, I see that
you are only keeping one day of McIDAS GRID files online (from the
entry in ~ldm/decoders/bin/mcscour.sh which is run from cron).

>Let me know if you are able to get the compressed GINI files via ADDE.

We believe that at some point in the past you must have asked the UGA
network folks for port 500 to be opened to the outside world.  This
was probably for a version of McIDAS that did not support compressed
transfers.  You need to make the same request for port 503 now.  I am
convinced that this is what is needed since I can do compressed ADDE
stuff from cacimbo to the remote ADDE server on cacimbo, but I can not
do this from any machine here at the UPC.  The conclusion is that
access to port 503 is blocked by a firewall.

>We're making progress!

Yes, I agree.  You may want to consider making a couple of other changes:

o first, you should request NLDN data from striker.atmos.albany.edu
  explicitly in ldmd.conf:

  change:

  request ANY ".*" striker.atmos.albany.edu

  to:

  request NLDN ".*" striker.atmos.albany.edu
 
o you should think about paring down your request for model data (also in
  ldmd.conf).  Right now, the same model data is sent over in multiple
  projections.  It seems to us that getting the projection that you want
  would cut down on IDD traffic to your machine; processing of the data;
  and use of disk space.  All-in-all, this would be a very good thing
  to pursue.

>Thanks for your help.

You're welcome.  Good job on upgrading the machine today and switching
swap to a faster disk!

Tom

Earlier message with no comments from me (for the tracking system only):

On Wed, 09 Aug 2000 00:09:04 -0600 Unidata Support 
<address@hidden> wrote:

>I turned off the LDM while I was installing patches last 
>night to speed things up, since I was doing it from home.
>Keeping a modem connection open for a long enough time is 
>always tricky.
>
>I have installed a set of current recommended patches for 
>2.5.1. I have also made the change to remove tool talk.
>
>I installed top (/usr/local/bin/top) and set the ldm to 
>restart in the middle of the night as an "at" job. 
>
>I got on this morning and looked at the load -- very high, 
>up to 33!!! It looks like it's mostly decoders chewing up 
>CPU time, especially several dchrly's. All of the patches 
>were installed.
>
>HOWEVER, I stopped and immediately restarted the LDM, and 
>the load started falling to less than 5, even after 
>checking to see that the data feed was established, it 
>stays less than 5 for at least 15 minutes. There is 
>something going on here. I am running LDM 5-0-8. I just 
>noticed that there appears to be an ldm-5.0.10 and a 5.1. 
>Would this make any difference. The GEMPAK decoders were 
>built in January and have not been updated. I will look to 
>see if there are any updates/patches.
>
>I noticed I am getting LDM stats but no logs. I'll have to 
>take a look later this morning and see why.
>
>BTW, just in case you want to take a peek, I have changed 
>the LDM password temporarily to the same password I gave 
>you for mcidas.
>
>A few issues regarding system performance:
>
>1. Is there a significant advantage to Solaris 7 (or 8) 
>over a heavily patched version of 2.5.1? If all of my NIS+ 
>configuration would upgrade easily, I would just install 
>Solaris 7 (server). Since this is not a 64bit system, I 
>wasn't sure of the advantages of the later versions of 
>Solaris.
>
>2. Is performance mostly an issue of RAM or of processor? 
>This is a dual processor SPARC 20 (two 50MHz SuperSPARCS) 
>with 128MB of RAM. I could certainly purchase more RAM, but 
>it isn't worth messing with the processors.
>       My options are:
>       a.) Buy some more RAM... not sure of how many slots 
>       are open on cacimbo.
>       b.) Move everything to a Sun Ultra 1 143 MHz with 
>       192MB RAM that I have in my lab
>       c.) Move everything to a RedHat LINUX box that is 
>       running the NOAAport ingest. It is a Pentium II
>       350MHz system that I can put plenty of RAM on.
>       The NOAAport system really uses very minimal
>       overhead ... it's fairly efficient.
>I don't use this system to do much else than serve data 
>and run the NIS+ network. All the NIS+ really does is 
>authenticate student accounts on the Solaris x86 systems in 
>the lab, so I don't have to create individual accounts on 
>each system.
>
>3. I have the OS and swap partition on an older, slower 
>SCSI hard disk and the data on the new, faster hard disk. I 
>know this is far from optimal, but it was MUCH easier. I 
>did have the foresight to save a root and swap partition 
>space on the new disk for later use, when I can dump the 
>slower drive. Would it help much to just move the swap 
>partition to the new disk and leave the OS on the older 
>disk?
>
>I know this is a bit involved, but I have been using this 
>same machine for the LDM since early 95. I'm now on my 
>fourth hard drive and replaced the motherboard a few years 
>ago.
>
>Thanks.
>
>Tom

To: "Thomas L. Mote" <address@hidden>
Subject: 20000809: 20000808: quick update before going to bed 
Date: Wed, 09 Aug 2000 16:23:26 -0600

Tom,

When upgrading to ldm 5.1.1 from ldm 5.0.x, ,you must make a new queue
since the old queue format is not compatible with the new program.
Similarly, the queue structure for 5.1.2beta is also different.

Steve Chiswell


>From address@hidden Wed Aug  9 18:14:40 2000
>Subject: Re: 20000809: performance, LDM, and ADDE remote server issues on 
>cacimbo

Tom:

OK. Last message for today.

Following Steve's note this afternoon, I created a new queue and was
able to switch over to LDM 5.1.1.

I also changed the NLDN request as you suggested. I'll have to spend
some time looking at the model output to see what we can dispense with.
I think we'll be OK on disk space for the time being. I have the extra
6GB in reserve, in addition to the 11GB I'm using. For now, the GOES
GINI data is also on a separate 6GB disk.

I sent a request to our computing center about the 503 port access for
ADDE compressed data transmission. We'll see what they say.

Regarding the system, it occurred to me that I have a dual 350MHz
Pentium II machine that was spec'd (at the time we bought it) for
maximum throughput. It has a fast bus, ultra wide SCSI, etc. That will
be available soon. It's probably the best prospect for a new LDM
server. I think we can make do with what we have for the near term.

Thanks for your help. 

Tom