[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "pbuf_flush: time elapsed" problem



Justin,

>Date: Fri, 16 Sep 2005 09:48:18 -0400
>From: Justin Cooke <address@hidden>
>Organization: NOAA
>To: address@hidden
>Subject: Re: "pbuf_flush: time elapsed" problem

The above message contained the following:

> I added the logging to the script that we are seeing the most 
> "pbuf_flush Timed out" from.  Yesterday we also broke apart the pqact 
> processing, previously we only had one pqact for all incoming data.  We 
> created a second entry in ldmd.conf for just CRAFT data and the existing 
> entry was modified to act on all data type except CRAFT but we still saw 
> the same errors overnight.  Below is another log entry:
> 
> ---
> Sep 16 03:02:21 b2n1 eldm4[585824]:  7504142 20050916030208.060    FSL5 
> 000  FSL.CompressedNetCDF.MADIS.mesonet4.20050916_0100.gz
> Sep 16 03:02:21 b2n1 pqact[909556]:  7504142 20050916030208.060    FSL5 
> 000  FSL.CompressedNetCDF.MADIS.mesonet4.20050916_0100.gz
> Sep 16 03:02:21 b2n1 pqact[909556]:                pipe: -close 
> /usr/ldm/ncep_bin/run_dcmeso.sh 
> /dcomdev/us007003/ldmdata/obs/surface/mesonet 20050916_0100.gz
> Sep 16 03:04:21 b2n1 pqact[909556]: pbuf_flush 2: time elapsed 120.000055
> Sep 16 03:04:21 b2n1 pqact[909556]: pbuf_flush (2) Timed out
> Sep 16 03:04:21 b2n1 pqact[909556]: pipe_put: 
> -close/usr/ldm/ncep_bin/run_dcmeso.sh/dcomdev/us007003/ldmdata/obs/surface/mesonet20050916_0100.gz
>  
> write error
> ---
> 
> But there is nothing in the dcmeso.log file related to this time:
> 
> ---
> END ---- PID= 1151078  ---- Thu Sep 15 23:02:08 EDT 2005 
> ----------------------------
> exit 0
> + exit 0
> echo "START -- PID=" $$ " ----" `date` "----------------------------"
> + date
> + echo START -- PID= 684064  ---- Thu Sep 15 23:04:21 EDT 2005 
> ----------------------------
> START -- PID= 684064  ---- Thu Sep 15 23:04:21 EDT 2005 
> ----------------------------
>                                                                               
>                                                                               
>           
>                      
> gzip -d > ${1}/$$.${2}
> + gzip -d
> + 1> /dcomdev/us007003/ldmdata/obs/surface/mesonet/684064.20050916_0200.gz
> madisfilename=${1}/`echo ${2} | cut -c1-13`
> ----
> 
> The END time of the previous run_dcmeso.sh instance is 03:02:08Z and the 
> next doesn't start until 03:04:21Z which is right after the "Timed out" 
> error but the file being acted on is 20050916_0200.gz where the "Timed 
> out" was for 20050916_0100.gz.  The "START" line is never added when the 
> the pipe: -close for 03:02:21 is run, it's as if the log is never 
> touched, looking like the script never ran.
> 
> any ideas?

Not yet.  I'm going to ponder this today.

Incidentally, you might want to invoke the newlog(1) utility via
crontab(1) to keep the logfile "dcmeso.log" from growing indefinitely.

> We are running this version of LDM on AIX 5.2, is there any other system 
> information that you would find useful?

I'll let you know.

> I have attached run_dcmeso.sh, it is slightly different from the script 
> that I was talking about yesterday but it is doing the same major steps, 
> unzipping a file and sending it off to a decoder.
> 
> Thanks for the help,
> 
> Justin
[snip]
> #!/bin/sh -vx
> #
> #  This script is EXECed directly by DBNet in order to run the
> #  dcmeso decoder on the data file given in the first argument.
> #
> #    Usage: ./run_dcmeso.sh <mesonet_filename>
> #
> #  Once this is done, the data file itself is then compressed
> #  within its native directory for more efficient short-term
> #  storage.
> #
> 
> # The gzip line must be the first, noncomment line in this script
> # so that stdin is processed correctly
> exec >> /usr/ldm/logs/dcmeso.log 2>&1
> echo "START -- PID=" $$ " ----" `date` "----------------------------"
> 
> gzip -d > ${1}/$$.${2}
> madisfilename=${1}/`echo ${2} | cut -c1-13`
> hhmm=`date -u +%H%M` 
> decoderfilename=${madisfilename}.${hhmm}
> mv ${1}/$$.${2} ${decoderfilename}
> 
> . /ioddev/dbndev/.profile
> 
> #export MADIS_STATIC=/nwprod/lib/sorc/madis/static
> export MADIS_STATIC=$DCDROOT/lib/sorc/madis-2.5/static
> export MADIS_DATA=/dcomdev/us007003/ldmdata
> 
> ln -sf ${decoderfilename} ${madisfilename}
> 
> nice $DCDROOT/bin/decod_dcmeso -v 2 -d 
> /dcomdev/us007003/decoder_logs/decod_dcmeso.log  ${decoderfilename} 
> /dcomdev/us007003/bufrtab.255
> 
> rm -f ${madisfilename}
> 
> #
> #  Compress the decoder input file within its native directory,
> #  in order to conserve disk space for these large files!
> #
> 
> gzip ${decoderfilename}
> 
> #
> #  Explicitly set the script return code to 0, in order to prevent
> #  the "compress" return code from becoming the script return code
> #  (and thereby prevent DBNet from re-running the script for this
> #  particular data file if there is a problem with the compress!)
> #
> 
> echo "END ---- PID=" $$ " ----" `date` "----------------------------"
> exit 0

Regards,
Steve Emmerson


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.