[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20010821: Gempak, Perl, and Gemplt errors



Sean,

Now that your satellite display problems are apparently fixed, 
I'll address the question below regarding semaphore locks etc.

When the LDM starts up, you can frequently get a backlog of
data from your upstream product queue. If you are EXEC'ing
processes based on the arrival of data, this could lead to
a large number of processes getting forked.

For these situations, I employ a file lock that prevents more than
one copy of a program from executing at a time. The other copies
of a script all wait until its their turn.

As an example:

#/bin/csh -f


<some preliminary stuff>

cd $WORKDIR

set LOCK=.inuse.$$
touch $LOCK

@ COUNT = 0
set TEST=`ls -rt .inuse.* | head -1`
while(($TEST != $LOCK)&&($COUNT < 61))
   sleep 10
   set TEST=`ls -rt .inuse.* | head -1`
   if($COUNT == 60) then
      echo "Please check `hostname` for process $TEST" | \
         /usr/bin/mailx -s LDM process chiz
      rm $LOCK
      exit 0
   endif
   @ COUNT = $COUNT + 1
end


<create some product>

#remove your lock

rm $LOCK



The idea is that every copy of the script will cd to $WORKDIR
and execute the "touch" command (create a 0 length file). This
is not CPU or IO intensive, and is much less than the alternative
of dozens or more copies of a script executing.

After the lock file is touched, a finite loop is entered until
the ls command returning the oldest waiting lock process id
is that of the current script. If the script has to
wait more than 60 loops of 10 seconds (10 minutes), then
it will exit. This prevents your system from getting into
a state where it can not fork any more processes.

My NIDS gif generation uses a virtual X server (Xvfb) which
is display :1 of my server. Since it is not a command or login 
server, color allocation is generally not a problem. The
X drawing is entirely in a memory "screen". The
limiting of the total number of
processes running solves most problems trying to fork a gplt process.

If you are using a single program, you can use the _gf version
of the program in $GEMEXE that links the gf driver directly to
the executable so there is no forking of the gplt process (saving 
more CPU resources).

The one possible problem with getting a gplt message queue open for
crons is that two scripts can both execute gempak programs,
and those programs will both check to see if a message queue 
exists and either both try to connect to it, or both get
an id for a new queue to be created that is the same.
Then, whe the programs try to forl the gplt, one of
the programs gets blocked out  or failes and the message queue
either hangs or the gplt is orphaned.

Generally, when you run from a tty, each tty has its own
message queue ids, so there is no gplt problem there.
But, when invoked from a cron, they all have the same effective tty.
So, you should avoid multiple scripts from cron that execute at the same time
This will also reduce the likelihood that the display will not have the
requested colors. You can also use the "gif" driver instead of the gf
driver to eliminate the need for an X server. 

Steve Chiswell
Unidata User Support








>From: Sean Daida <address@hidden>
>Organization: UCAR/Unidata
>Keywords: 200108220025.f7M0PZ102574

>Hi there (Chiz?),
>
>I don't know if you remember me.  I used to be a student employee at the
>NWS-HNL (1998?).  I asked you some questions about GEMPAK way back
>then.  Sorry I missed you at the Unidata training this summer.  I should
>have taken GEMPAK as well but thought I could only take two classes.
>That extra GEMPAK manual went to me.  =)  I recently got hired by Dr.
>Businger for the Mauna Kea Weather Center and the Met Web page -- your
>old stomping ground.
>
>My predecessor (Rick Knabb) used perl to drive all the product and web
>page generation.  I would like to enhance the system and make it more
>reliable, not rewrite the works.  In the time between Rick leaving and
>me coming on, I think some of the data formats changed and many things
>broke.  I don't know how well Rick got it working, but I suspect he left
>before he was satisfied with the site.  I am trying to figure out how
>things are supposed to go, while not knowing how reliable the existing
>system was to begin with.  However, I am making progress.
>
>The issue is the same old gemplt error that I have seen many references
>to in the archive.  I noticed a couple things.
>
>First, you make use of a lock environment variable in a csh script.  I
>was wondering if there was something similar to this in perl.  I can see
>using a physical file to set the lock.  I don't know how much this will
>slow down the entire system.  A shared environment variable accessible
>from within perl seems the best way.  Do you have any perl specific
>advice.  I looked through the archive and didn't find very much.
>
>The other thing I noticed is the use of a display other than 0.0 in the
>"setenv DISPLAY machine:0.0" statement.  I have used this command for
>years but I don't know the guts of it (am reading about it right now).
>I was wondering, how many displays do I have?  Will two processes using
>the same DISPLAY setting interfere with each other if they are both
>trying to setup a GPLT?  If I use a lock file, will I need a separate
>lock file for each display I use?
>
>I know how to kill the processes and clear the memory queues.  I have
>written really stupid scripts to do some of the work for me.  The
>unfortunate thing is that certain scripts fail in this way everytime
>they execute part way.  In these instances, the log file shows gemplt
>running along fine, and then I get the "Error in message send = 22...."
>error.  Also, one of our satellite scripts tends to die at the same time
>every day.  This leads me to believe I've got problems related to the
>above questions.
>
>For your information here's the basic setup.  We have two data feeds.
>One an LDM, the other, is a direct FTP link to the weather office
>downstairs.  They come in on different machines and both seem to be
>working reliably.  For the web page and forecast products, the data are
>grabbed from where the LDM or NAFTP script files them.  The pqact and
>the NWS script don't do any actual data processing for me, just filing.
>From there, the cronjob launches csh scripts, which in turn, launch the
>perl scripts.  It's the perl scripts which copies the data to the local
>machine and does all the image and product generation.  The products are
>ultimately used in cgi script web page.  When perl calls gempak for
>plotting (i.e. snprof, gdlist, gdmap, etc...) the parameters are first
>created and sent to a text file.  Then a system call is made using this
>file.  If you need anything more specific, just let me know.  I think
>Rick did a good job with the site.  Just a few things are catching.
>
>If you want to see what we've got going right now here are the links:
>vision.soest.hawaii.edu (yup...the name will never change).
>lumahai.soest.hawaii.edu
>hokukea.soest.hawaii.edu
>
>Thanks for your time and help.
>
>Hope to talk to you soon.
>
>Sincerely,
>Sean Daida
>address@hidden
>(808)956-4593
>
>