[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20040914: 20040914: gempak processing getting wedged...



Gerry,

I'm not having trouble with VCP121 display in gpnexr2.

As I mentioned before, if you are kicking a script off from pqact
to generate GEMPAK graphics, I'd suggest using gpnexr2_gf or gpnexr2_gif
that don't require a message queue if you don't have other plots to overlay
which require the use of the gplt process. That would save alot of processes
from having to be created, eg gpnexr2 would use 3 processes ( gpnexr2, gplt and
the device driver), while gpnexr2_g* are just a single process.

The most common problem with initiating GEMPAK scripts that use message 
queues from the LDM is a race condition in the system handing out the IPC number
for the next unused number versus the system call to request the message queue
before another process requests the next unused number. 

Also, the most important thing I have mentioned in the past is
I always suggest placing a semaphore check in the script to prevent more than
one of a particular script from running at a time to prevent the system from 
getting
clobbered when the LDM connects/reconnects and gets a slug of data all at once.

For example, for a radar gif generation here, I use:

###############<snip>################################

#
# Make sure the NEXRAD file exists before seting lock
#
@ COUNT = 0
while (! -e $RAD/NIDS/${SITE}/${TYPE}/${FILENAME} )
   sleep 2
   @ COUNT = $COUNT + 1
   if($COUNT == 10) then
      echo "`hostname`: Could not find ${SITE}/${TYPE}/${FILENAME} for nids 
generation" | \
         /usr/bin/mailx -s NIDS chiz
      exit 0
   endif
end

#
# change to working directory and set lock...wait until older locks are removed
#
if ( ! -e $WEBDIR ) mkdir -p $WEBDIR
cd $WEBDIR

set LOCK=.inuse.$$
touch $LOCK

@ COUNT = 0
set TEST=`ls -rt .inuse.* | head -1`
set OFFENDING=$TEST
while(($TEST != $LOCK)&&($COUNT < 61))
   sleep 4
   set TEST=`ls -rt .inuse.* | head -1`
   if ( ( $COUNT == 50 ) && ( $TEST == $OFFENDING ) ) then
      # this lock has been around a really long time. Maybe its toast.
      rm -f $OFFENDING
   endif
   if($COUNT == 60) then
      echo "Please check `hostname` on $TEST for nids generation $SITE $TYPE 
$DATTIM" | \
         /usr/bin/mailx -s NIDS chiz
      rm $LOCK
      exit 0
   endif
   @ COUNT = $COUNT + 1
end

# now make the gif
###########<snip>########################


Your $WEBDIR in the above could be unique for each site since you have the cpu 
for it,
what I'm saying is you don't want 1 hours worth of data for 120 radars coming in
a minute or two after startup of the LDM causing several thousand plots to be 
kicked off at once. The above script will remove old locks if they don't seem 
to be
going away (which would happen if your machine rebooted in the middle of a 
plot).
The script snippet will also automatically make the plotting script die if it 
has to wait too long (eg a catch up mechanism) since typically we want current
plots.

The above boiler plate would work for any script where you would define WEBDIR
unique to the particular script.



Steve Chiswell





>From: Gerry Creager n5jxs <address@hidden>
>Organization: AATLT, Texas A&M University
>Keywords: 200409150143.i8F1hFnJ023451

>This is a multi-part message in MIME format.
>--------------070004000706090006020109
>Content-Type: text/plain; charset=us-ascii; format=flowed
>Content-Transfer-Encoding: 7bit
>
>Unidata Support wrote:
>> Hi Gerry,
>> 
>> 
>>>I've been doing a fair bit of Level II image generation on gemdata2 
>>>recently, and I'm seeing a lot of interprocess communications hangups 
>>>now.  I've a set of questions.
>>>
>>>1.  We're using the transfer of the site volume scan from hidden to 
>>>"normal" (\E$) to fire the script that does image generation for that 
>>>site.  Is there a better trigger?
>> 
>> 
>> I don't think so.  the final quadrant of a full volume scan contains
>> the 'E' indicator for end.  This, in turn, kicks off the renaming
>> of the hidden file so it is viewable.  At that point, the data should
>> be read to be processed in any way you want.
>
>OK.  I'm attaching the pqact in question.  Maybe there's something 
>obvious...
>
>You'll note I'm now firing these off to gemdata3 for about 1/2 of the 
>scripts.
>
>>>1a.  When data's missing, the script failure rate skyrockets.  Great. 
>>>Makes sense, but should that script even be firing?
>> 
>> 
>> Since it is -currently- possible that a piece of the volume scan is
>> missing (more on this below), kicking off a script that requires
>> every piece of the volume scan to exist to run correctly can be
>> problematic.  A more correct/complet/heavyweight approach would be
>> for the script to somehow determine if the volume scan has any
>> missing pieces and then do the GEMPAK processing if it doesn't.
>> However, I have seen Chiz put up plots from volume scans that were
>> missing pieces, so maybe there is more to what is going on that
>> the simple view.
>
>Is this, by chance associated with the VCP121 issues and range folding?
>
>>>2.  We're now seeing high load averages and resource exhaustion. Might 
>>>this be playing into the mix?
>> 
>> 
>> It could, yes.
>> 
>> 
>>>The system's a single cpu celeron at 2 
>>>GHz with 512 MB RAM.  I'm thinking a change to 1 GB (because I've got a 
>>>system with that in place available for this) might be the key.  Or at 
>>>least a help.
>> 
>> 
>> What kind of load averages are you seeing?  I would have thought that
>> a 2 Ghz machine with 512 MB of RAM should be sufficient.
>
>7,8 in that region...
>
>>>It's still early in Texas... and in DC by my body's clock... so I've not 
>>>heard a status on bigbird yet.
>> 
>> 
>> I am unable to ping, ldmping, or ssh bigbird at this time, so it looks
>> like it is still down.
>
>Still FSCKing.
>
>Mail doesn't work in the SURA meetings.  Their wireless doesn't allow 
>VPN passage.  If you've need of contacting me, please call the cellphone 
>at 979.229.5301.
>
>thanks!!!
>gerry
>-- 
>Gerry Creager -- address@hidden
>Texas Mesonet -- AATLT, Texas A&M University   
>Cell: 979.229.5301 Office: 979.458.4020
>FAX:  979.847.8578 Pager:  979.228.0173
>Office: 903A Eller Bldg, TAMU, College Station, TX 77843
>
>--------------070004000706090006020109
>Content-Type: text/plain;
> name="pqact.gempak_craft.tamu"
>Content-Transfer-Encoding: 7bit
>Content-Disposition: inline;
> filename="pqact.gempak_craft.tamu"
>
># CRAFT stored as raw bz2 for GEMPAK
>#
># file the raw data to a temporary file beginning with "." so that autoupdate 
> GUIs don't
># get ugly partial volume plots
>CRAFT  ^L2-BZIP2/(KABX|KAMA|KBRO|KCRP|KDFX|KDYX|KEPZ|KEWX|KFDR|KFDX|KFWS|KGRK|
> KHDX|KHGX|KINX|KLBB|KLCH|KLKZ|KMAF|KPOE|KSHV|KSJT|KTLX|KVNX|KBYX|KAMX|KTBW|KM
> LB|KJAX|KMOB|KSRX|KTLH)/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9]
> [0-5][0-9])([0-9][0-9])
>       FILE    data/gempak/nexrad/craft_all/\2/\1/.\1_\2_\3
>#
># Done to move file after last record is received "/E" to prevent
># autoupdate from seeing partially received files (dccraft_move is a shell scr
> ipt copied from $NAWIPS/bin/scripts)
>CRAFT  ^L2-BZIP2/(KABX|KAMA|KBRO|KCRP|KDFX|KDYX|KEPZ|KEWX|KFDR|KFDX|KFWS|KGRK|
> KHDX|KHGX|KINX|KLBB|KLCH|KLKZ|KMAF|KPOE|KSHV|KSJT|KTLX|KVNX|KBYX|KAMX|KTBW|KM
> LB|KJAX|KMOB|KSRX|KTLH)/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9]
> [0-5][0-9])([0-9][0-9]).*/E$
>       EXEC    util/dccraft_move data/gempak/nexrad/craft_all/\2/\1/.\1_\2_\3 
> data/gempak/nexrad/craft_all/\2/\1/\1_\2_\3
>#
>#
># CRAFT stored uncompressed (not needed for GEMPAK 5.7.2p2 and later)
>#CRAFT ^L2-BZIP2/(....)/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][
> 0-5][0-9])([0-9][0-9])
>#      PIPE    decoders/dcnexr2 -s \1 -d /dev/null data/gempak/nexrad/craft_un
> compressed/\1/\1_\2_\3
>#
>###Add specific decoders for Level II data for each of the various sites
>CRAFT  ^L2-BZIP2/KABX/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    ssh address@hidden scripts/KABX.sh &> /dev/nul
>#      EXEC    scripts/KABX.sh
>CRAFT  ^L2-BZIP2/KAMA/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    scripts/KAMA.sh
>CRAFT  ^L2-BZIP2/KFWS/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    ssh address@hidden scripts/KFWS.sh &> /dev/nul
>CRAFT  ^L2-BZIP2/KHGX/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    scripts/KHGX.sh
>CRAFT  ^L2-BZIP2/KLBB/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    ssh address@hidden scripts/KLBB.sh &> /dev/nul
>CRAFT  ^L2-BZIP2/KBRO/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    scripts/KBRO.sh
>CRAFT  ^L2-BZIP2/KCRP/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    ssh address@hidden scripts/KCRP.sh &> /dev/nul
>#KDFX is a DoD site and not in the net
>#CRAFT ^L2-BZIP2/KDFX/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>#      EXEC    scripts/KDFX.sh
>#KDYX is a DoD site and not in the net
>#CRAFT ^L2-BZIP2/KDYX/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>#      EXEC    scripts/KDYX.sh
>CRAFT  ^L2-BZIP2/KEPZ/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    scripts/KEPZ.sh
>CRAFT  ^L2-BZIP2/KEWX/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    ssh address@hidden scripts/KEWX.sh &> /dev/nul
>CRAFT  ^L2-BZIP2/KFDR/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    scripts/KFDR.sh
>#KFDX is a DoD site and not in the net
>#CRAFT ^L2-BZIP2/KFDX/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>#      EXEC    scripts/KFDX.sh
>#KGRK is a DoD site and not in the net
>#CRAFT ^L2-BZIP2/KGRK/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>#      EXEC    scripts/KGRK.sh
>#KHDX is a DoD site and not in the net
>#CRAFT ^L2-BZIP2/KHDX/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>#      EXEC    scripts/KHDX.sh
>CRAFT  ^L2-BZIP2/KINX/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    ssh address@hidden scripts/KINX.sh &> /dev/nul
>CRAFT  ^L2-BZIP2/KLCH/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    scripts/KLCH.sh
>CRAFT  ^L2-BZIP2/KLKZ/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    ssh address@hidden scripts/KLKZ.sh &> /dev/nul
>CRAFT  ^L2-BZIP2/KMAF/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    scripts/KMAF.sh
>#KPOE is a DoD site and not in the net
>#CRAFT ^L2-BZIP2/KPOE/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>#      EXEC    scripts/KPOE.sh
>CRAFT  ^L2-BZIP2/KSHV/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    ssh address@hidden scripts/KSHV.sh &> /dev/nul
>CRAFT  ^L2-BZIP2/KSJT/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    scripts/KSJT.sh
>CRAFT  ^L2-BZIP2/KTLX/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    ssh address@hidden scripts/KTLX.sh &> /dev/nul
>CRAFT  ^L2-BZIP2/KVNX/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    scripts/KVNX.sh
>#KSRX is a DoD site and not in the net
>#CRAFT ^L2-BZIP2/KBYX/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>#      EXEC    scripts/KSRX.sh
>###Hurricane Frances added 09/02/04
>CRAFT  ^L2-BZIP2/KBYX/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    ssh address@hidden scripts/KBYX.sh &> /dev/nul
>CRAFT  ^L2-BZIP2/KAMX/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    scripts/KAMX.sh
>CRAFT  ^L2-BZIP2/KTBW/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    ssh address@hidden scripts/KTBW.sh &> /dev/nul
>CRAFT  ^L2-BZIP2/KMLB/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    scripts/KMLB.sh
>CRAFT  ^L2-BZIP2/KJAX/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    ssh address@hidden scripts/KJAX.sh &> /dev/nul
>CRAFT  ^L2-BZIP2/KMOB/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    scripts/KMOB.sh
>CRAFT  ^L2-BZIP2/KTLH/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9])([0-2][0-9][0-
> 5][0-9])([0-9][0-9]).*/E$
>       EXEC    ssh address@hidden scripts/KTLH.sh &> /dev/nul
>###end
>
>--------------070004000706090006020109--
>
--
NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publicly available
through the web.  If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.