[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030827: LDM-6 installation at UNCA (cont.)



>From:  ahuang <address@hidden>
>Organization:  UNCA
>Keywords:  200307301533.h6UFXPLd024978 LDM McIDAS upgrade

Hi Alex,

I am CCing Matt Rosier on this note in case he has been delegated
the responsibility of looking after the LDM and McIDAS installations
on UNCA machines.  Here goes:

It is time to update you on changes I made on storm2...

>Thank you for e-mail and your willingness to help us out, so you are not that 
>busy after all :-))  Are you sure it will only take half of an hour?

As you can see, it took me almost an entire month to get to all of the
updates that were needed on storm2, so I guess I was more busy than
I originally implied.  I can tell you, however, that the LDM upgrade
did take less than a half an hour.  In fact, I seem to remember that
upgrading from LDM-6.0.13 (which Leigh had installed) to LDM-6.0.14
and adding reporting of realtime statistics took on the order of
10 minutes.

>Your e-mail also gave the reason that why storm2 was 100% full and crashed, 
>since the data volume had exploded in July.  We had to clean up manually to 
>get storm2 back two days ago, so far it is running fine.

In going through the McIDAS upgrade to v2003, I found a what was probably
the real reason that storm2 filled up.  It seems that the files being
created by McIDAS-XCD (the decoder component of the Unidata McIDAS
distribution) were not getting scoured.  Moreover, the output point
source data files (MDXX files) were all full and had been so since
sometime in June.  This was causing the McIDAS-XCD decoding processes
to continually fail and use up CPU in doing essentially nothing.

The culprit in the failure of scouring XCD-produced data was the
non-existence of ~mcidas/util/mcscour.sh.  The ~mcidas/util
directory did not exist, so mcscour.sh in the directory couldn't
exist.  The scouring was being handled by a cron entry for the
user 'mcidas' that was continually failing.  Here is one of the
emails being sent by the OS to 'mcidas' that tells of the problem:

  From address@hidden  Sat Jun  7 01:00:04 2003
  Date: Sat, 7 Jun 2003 01:00:01 -0400
  From: address@hidden (Cron Daemon)
  To: address@hidden
  Subject: Cron <mcidas@storm2> /home/mcidas/util/mcscour.sh
  X-Cron-Env: <SHELL=/bin/sh>
  X-Cron-Env: <HOME=/home/mcidas>
  X-Cron-Env: <PATH=/usr/bin:/bin>
  X-Cron-Env: <LOGNAME=mcidas>
  
  /bin/sh: line 1: /home/mcidas/util/mcscour.sh: No such file or directory

This tells us that the scouring failure started at least on June 7.

I corrected the scouring failures by moving the scouring activities
to the 'ldm' account (it could have been done in the 'mcidas' account,
but I felt that consolidating and simplifying the McIDAS installation
was a good thing to do) by copying mcscour.sh from the ~mcidas/workdata
directory to ~ldm/decoders.  mcscour.sh, a Bourne shell script, then
had to be edited to set various Unix environment variables used by
McIDAS, and an entry was made in 'ldm's crontab file to run the scouring:

#
# Scour McIDAS data files
#
00 01 * * * decoders/mcscour.sh

Since all of the McIDAS point data files (/data/mcidas/data/MDXX*)
were full or corrupted, I deleted them and resetup McIDAS-XCD decoding:

<as 'mcidas' done while the LDM was not running>

cd /data/mcidas/data
rm MDXX*
rm *.RA*
rm *.IDX
rm *.IDT

cd ~mcidas/workdata
tl.k                   <- to verify that XCDDATA was set; it was
batch.k XCD.BAT
batch.k XCDDEC.BAT

<review/update the McIDAS file REDIRECT entries in ~mcidas/workdata/LWPATH.NAM>

<as 'ldm'>
ldmadmin start                <- restart the LDM

>I know you can make the storm2 more reliable, so please go ahead and upgrade 
>LDM in storm2.

>I also wants students to use McIDAS and 
>GEMPAK using a generic student account that has been set up.

If I knew the name of the generic student account, I would logon as that
user and make sure that the account is setup correctly.  Is the
name of the generic account 'unca-mcidas'?

>I do hope that a few critical things can be maintained:
>
>1.     storm2 prints out DIFAX-like maps by one of three cron jobs, one for 
>vacation, one for weekends, and the third for weekdays.   These maps are 
>printed out by an IP printer (HP LJ5000).   This is very essential to our 
>instruction, so please don?t change it.

I did not touch this setup, so it should not have been affected.

>2.     storm2 is the LDM server, and it also has McIDAS and GEMPAK (by ntl 
>command).  Other LINUX PC?s have McIDAS and GEMPAK.   I now seem to have 
>problems to use ntl and mcidas commands on other PC?s, and I am trying to 
>figure it out, because it was working before the departure of Leigh Orf in 
>July.  I may ask for your help later, it should be easy.

I can't speak on the GEMPAK problems, but I can speculate on the McIDAS
problems.  One problem I found on storm2 was the setup of the remote
ADDE server.  The v2002 remote server required the creation of three
files in the /etc/xinetd.d directory: mcserv, mcidasz, and mccompress.
These files existed and contained entries that worked correctly for
versions of RedHat Linux previous to 8 (i.e., things worked on 6.x and
7.x).  There was one entry, however, in all of these files that is not
liked by RedHat 8.0 and 9.0.  I corrected this problem, and the remote
server on storm2 started working again.  The McIDAS v2003 remote ADDE
server installation script takes care of this problem.  More on
v2003 below.

>3.     storm2 is setup to stop and start LDM while booting, in case of power 
>failure.   Leigh spent some hours on getting this to work, and I am not sure 
>about this, maybe new LDM can do a better job in closing LDM completely.

I just took a look at the startup script that Leigh put in place for the
LDM.  If pressed, I would modify this script to check the integrity of
the LDM queue before trying to start the LDM.  Right now, the script
simply starts the LDM with no checks.   It is our observation that
the LDM queue can get damaged in cases where there is a power failure
or the system is rebooted without first stopping the LDM with:

<as 'ldm'>
ldmadmin stop

The queue is not guaranteed to get corrupted in these cases, but it
is the most likely time for it to happen.

The other thing I did to the LDM configuration was to setup all ldm-mcidas
decoders to log to the ~ldm/logs/ldm-mcidas.log file, and change the
crontab entry that rotates this file to keep 4 online instead of 2.
Moving of where the decoders log helps to clean up the LDM log file,
~ldm/logs/ldmd.log.  We really like to see decoder logging put in
a file other than ldmd.log so that it is easier to use ldmd.log
to diagnose LDM problems.  The change was a minor one, so there is
no need to worry about it :-).  For reference, I made the logging
changes in all of the pqact.conf files that are used on storm2:

pqact.conf
pqact.conf.vacation
pqact.conf.weekend
pqact.conf.weekday

I also copied two files to ~ldm/etc, SATANNOT and SATBAND, and I adjusted
all pqact.conf* pnga2area entries to use these files.

>4.     I would only like to keep 7 days of data, I hope 120 GB space in storm2
>  is enough, if not, 5 days of data are acceptable too.  I don't want any
>  maps and data archived.

I setup McIDAS-XCD to keep 7 days of decoded point source data.  I see
that it is not currently decoding model output data.  The entry in
~ldm/etc/pqact.conf that would allow the decoding is commented out, and
the entry in the McIDAS-XCD configuration file
~mcidas/workdata/DECINFO.DAT that would tell the XCD supervisor routine
to run grid decoding is not enabled.

Do you want McIDAS to decode model output data?

If yes, I would recommend only keeping one data of that data online.
The format of the McIDAS GRID files is not compressed like it is in
GEMPAK, so the GRID* files take up a LOT of room (like up to 7 GB per
day).

>5.     I can be physically at storm2 for your instructions or something, if
>   you need me.

All of the work that was needed could be done remotely.

>You now have my blessing to perform the upgrade on storm2.  Thank you again, 
>and hope your summer has been wonderful.   My summer is OK, and UNCA starts 
>August 20.

I am sorry I couldn't complete the upgrade of the McIDAS side of things
until today, a full 1 week after the fall UNCA session started.  I
can report, however, that things are now working nicely.

While on storm2, I decided to do more than upgrade the LDM and get
McIDAS working again:  I decided to upgrade McIDAS to the latest
version, v2003.  I also decided to upgrade the ldm-mcidas decoders
(the decoders for the image products in the Unidata-Wisconsin datastream)
to the latest release, ldm-mcidas-2003.

While trying to upgrade of McIDAS to v2003, I found that the development
environment on storm2 was not complete.  Missing were:

yacc, curses, X development, and things these depended on.  I installed
all of the needed stuff from RPMS to get the build to work.  Here is the
list of rpm files that I installed:

XFree86-devel-4.3.0-2.i386.rpm
byacc-1.9-25.i386.rpm
fontconfig-devel-2.1-9.i386.rpm
freetype-devel-2.1.3-6.i386.rpm
ncurses-devel-5.3-4.i386.rpm
pkgconfig-0.14.0-3.i386.rpm

After installing these, I was able to successfully build McIDAS-X/-XCD
v2003:

<as 'mcidas'>
cd mcidas2003/src
make all VENDOR=-g77          <- use gcc/g77 compiler combination
make install.all VENDOR=-g77

I did the installation part of the build/install with the LDM turned
off.

I then followed the upgrade steps listed in:

ABCs of Upgrading from a Previous Distribution
http://my.unidata.ucar.edu/content/software/mcidas/2003/mcx/upgrade_notes.html#upgrade_top

Based on the remote ADDE server problems I had previously, I decided
to uninstall and the reinstall the ADDE remote server stuff.  This had
to be done as 'root':

<as 'root'>
cd ~mcidas
sh ./mcinet2002.sh uninstall mcadde
sh ./mcinet2003.sh install mcadde

I verified that the v2003 ADDE remote stuff is working by pointing my
McIDAS session at your server for the RTPTSRC, RTIMAGES, and CIMSS
datasets and loading imagery, loops, doing plots and contours of a
variety of different data, and also plotting soundings.  Everything
looks solid at the moment.

>Let me know if you have any questions.

One observation of something I would change if given carte blanche.
Right now, the scripts run out of 'ldm's cron that create the maps
that you referenced in 1. above are located in the ~ldm/bin directory.
Since ~ldm/bin is a link to ~ldm/runtime/bin, and since this runtime
link gets changed for each new LDM installation, I would _strongly_
recommend that the shell scripts:

start_maps.csh
vacation_maps.csh
weekday_maps.csh
weekend_maps.csh

to a permanent directory like ~ldm/util.  If this move is done, the
crontab entries that run these scripts will need to be altered
slightly.  Please let me know if you would like me to make the
recommended changes.

OK, I think that is enough for now.  Please let me know if you
would like me to finish off items I listed above.

>Alex
>828-232-5157 (O) 828-258-0292 (H)

Cheers,

Tom
--
+-----------------------------------------------------------------------------+
* Tom Yoksas                                             UCAR Unidata Program *
* (303) 497-8642 (last resort)                                  P.O. Box 3000 *
* address@hidden                                   Boulder, CO 80307 *
* Unidata WWW Service                             http://www.unidata.ucar.edu/*
+-----------------------------------------------------------------------------+