[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

"pbuf_flush: time elapsed" problem (was: Problem with LDM 6.3.0)



Justin,

>Date: Thu, 15 Sep 2005 08:22:48 -0400
>From: Justin Cooke <address@hidden>
>Organization: NOAA
>To: address@hidden
>Subject: Problem with LDM 6.3.0

The above message contained the following:

> We have recently installed version 6.3.0 of LDM and are seeing 
> occasional errors with two of our PIPE processes.  I have included an 
> excerpt from the ldmd.log of one of the errors:
> 
> ---
> Sep 14 21:42:17 b2n1 eldm4[1171480]:   452967 20050914214215.998    PCWS 
> 000  FSL.CompressedNetCDF.MADIS.acars.20050914_2100.gz
> Sep 14 21:42:17 b2n1 pqact[1511588]:   452967 20050914214215.998    PCWS 
> 000  FSL.CompressedNetCDF.MADIS.acars.20050914_2100.gz
> Sep 14 21:42:17 b2n1 pqact[1511588]:                pipe: -close 
> /home/decdev/bin/run_dctamd.sh 
> /dcomdev/us007003/ldmdata/obs/upperair/tamdar 20050914_2100.gz
> Sep 14 21:44:17 b2n1 pqact[1511588]: pbuf_flush 2: time elapsed 120.000054
> Sep 14 21:44:17 b2n1 pqact[1511588]: pbuf_flush (2) Timed out
> Sep 14 21:44:17 b2n1 pqact[1511588]: pipe_put: 
> -close/home/decdev/bin/run_dctamd.sh/dcomdev/us007003/ldmdata/obs/upperair/tamdar20050914_2100.gz
>  
> write error

The error messages above mean that the pqact(1) process was unable to
flush the pipe to the script /home/decdev/bin/run_dctamd.sh.  The pipe
was open but the script wouldn't read from it within the allotted time
interval.  The command in the script that reads from the pipe is

    gzip -d > ${1}/$$.${2}

It's possible (though unlikely) that the gzip(1) process encountered a
problem with the data-product that caused it to terminate reading from
the standard input stream.

In any case, a definitive diagnosis is impossible unless a mechanism for
reporting errors is added to the script.  I suggest adding the command

    exec >> $HOME/logs/run_dctamd.log 2>&1

to the top of the script to help determine the cause of the problem.

Please contact me if you have any questions or discover something.

> Sep 14 21:44:17 b2n1 pqact[1511588]:                file: 
> /dcomdev/us007003/ldmdata/test/acars.20050914_2100.gz_214215
> ---
> 
> Throughout the day we receive hundreds of these acars messages but only 
> a couple will result in a time out and then the write error.  After this 
> error occurs the script that was acted on by LDM remains in the process 
> table and has to be purged with a kill -9.  We are also receiving this 
> feed to a different system but we are not seeing these errors.  On that 
> system the only difference is the version of LDM, 6.0.15, the pqact.conf 
> and script are the same for this datatype.
> 
> We tried version 6.4.1 and the same errors occurred, we also recompiled 
> 6.3.0 and increased DEFAULT_PIPE_TIMEO to 120 in pqact.c
> 
> #define DEFAULT_PIPE_TIMEO 120
> 
> again the errors still occurred.
> 
> I have attached the /home/decdev/bin/run_dctamd.sh script, it basically 
> unzips the stdin and puts the resulting data into a decoder.
> 
> Any ideas?
> 
> Thanks,
> 
> Justin Cooke
> NCEP Central Operations
> 
> --------------020706000607090506080009
> Content-Type: text/x-sh;
>  name="run_dctamd.sh"
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline;
>  filename="run_dctamd.sh"
> 
> #!/bin/sh -vx
> 
> #
> #  This script is EXECed directly by DBNet in order to run the
> #  dctamd decoder on the data file given in the first argument.
> #
> #    Usage: ./run_dctamd.sh <tamdar_filename>
> #
> #  Once this is done, the data file itself is then compressed
> #  within its native directory for more efficient short-term
> #  storage.
> #
> 
> # The gzip line must be the first, noncomment line in this script
> # so that stdin is processed correctly
> 
> gzip -d > ${1}/$$.${2}
> madisfilename=${1}/`echo ${2} | cut -c1-13`
> hhmm=`date -u +%H%M`
> decoderfilename=${madisfilename}.${hhmm}
> mv ${1}/$$.${2} ${decoderfilename}
> 
> . /ioddev/dbndev/.profile
> 
> export MADIS_STATIC=$DCDROOT/lib/sorc/madis-2.5/static
> export MADIS_DATA=/dcomdev/us007003/ldmdata
> 
> ln -sf ${decoderfilename} ${madisfilename}
> 
> nice $DCDROOT/bin/decod_dctamd -v 2 \
>   -d /dcomdev/us007003/decoder_logs/decod_dctamd.log \
>   ${decoderfilename} /dcomdev/us007003/bufrtab.004
> 
> rm -f ${madisfilename}
> 
> #
> #  Compress the decoder input file within its native directory,
> #  in order to conserve disk space for these large files!
> #
> 
> gzip ${decoderfilename}
> 
> #
> #  Explicitly set the script return code to 0, in order to prevent
> #  the "compress" return code from becoming the script return code
> #  (and thereby prevent DBNet from re-running the script for this
> #  particular data file if there is a problem with the compress!)
> #
> 
> exit 0
> 
> --------------020706000607090506080009--

Regards,
Steve Emmerson


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.