[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #XNJ-465270]: pbuf_flush error



Hi,

> At about 1430Z each day, we see these pbuf_flush warnings followed by
> pbuf_flush errors followed by the failure of a lightning decoder process.
> Can you help us diagnose and fix this?
> 
> 20181023T022812.117713Z pqact[43725] WARN pbuf.c:137:pbuf_flush() Write of 
> 4096 bytes to decoder took 8 seconds
> 20181023T142612.497446Z pqact[43725] WARN pbuf.c:137:pbuf_flush() Write of 
> 4096 bytes to decoder took 6 seconds
> 20181023T142714.004126Z pqact[43725] WARN pbuf.c:137:pbuf_flush() Write of 
> 4096 bytes to decoder took 7 seconds
> 20181023T143206.920529Z pqact[43725] ERROR pbuf.c:150:pbuf_flush() 
> write(19,,4096) to decoder timed-out (60 s)
> 20181023T143206.920576Z pqact[43725] ERROR filel.c:1932:pipe_put() Couldn't 
> write 131347-byte product to pipe
> 20181023T143206.920580Z pqact[43725] ERROR filel.c:2117:pipe_out() Couldn't 
> write product data to pipe
> 20181023T143206.920583Z pqact[43725] ERROR filel.c:2180:pipe_prodput() 
> Couldn't pipe product to decoder "-flush ltgdecode -e -p 3600 -L -N -o 
> /data/obs/lw/ltg/lr/%Y%m%d_%H.ltg_lr"
> 20181023T143206.920586Z pqact[43725] ERROR filel.c:425:fl_removeAndFree() 
> Deleting failed PIPE entry: pid=129522, cmd="-flush ltgdecode -e -p 3600 -L 
> -N -o /data/obs/lw/ltg/lr/%Y%m%d_%H.ltg_lr"
> 20181023T143206.920603Z pqact[43725] ERROR palt.c:1234:prodAction() Couldn't 
> process product: feedtype=HDS, pattern="SFPA41", action=pipe, args="-flush 
> ltgdecode -e -p 3600 -L -N -o
> /data/obs/lw/ltg/lr/%%Y%%m%%d_%%H.ltg_lr"

The error messages above indicate that a pqact(1) process couldn't write any 
data from an LDM data-product to a decoder within the timeout duration of 60 
seconds. This is most likely caused by the decoder, ltgdecode, not reading the 
data product in a timely manner. This can happen if the decoder is slow or 
doesn't read the entire data-product. Slow decoders should initially read the 
entire data-product into a temporary space (e.g., a file) and then process the 
data-product. Such decoders should probably also have the "-close" option 
specified in their PIPE entry in case they are invoked multiple times in rapid 
succession.

The remaining error messages indicate the same problem.

Does the ltgdecode program read the entire data-product?

Is that program slow? What's it written in?

> 20181023T143308.572510Z pqact[43725] ERROR pbuf.c:150:pbuf_flush()
> write(17,,4096) to decoder timed-out (60 s)
> 20181023T143308.572555Z pqact[43725] ERROR filel.c:1932:pipe_put() Couldn't
> write 134323-byte product to pipe
> 20181023T143308.572559Z pqact[43725] ERROR filel.c:2117:pipe_out() Couldn't
> write product data to pipe
> 20181023T143308.572562Z pqact[43725] ERROR filel.c:2180:pipe_prodput()
> Couldn't pipe product to decoder "-flush ltgdecode -e -p 300 -A -N -o
> /data/obs/ascii/ltg/lr/%Y%m%d_%H%M.ltg_lr"
> 20181023T143308.572565Z pqact[43725] ERROR filel.c:425:fl_removeAndFree()
> Deleting failed PIPE entry: pid=123416, cmd="-flush ltgdecode -e -p 300 -A
> -N -o /data/obs/ascii/ltg/lr/%Y%m%d_%H%M.ltg_lr"
> 20181023T143308.572585Z pqact[43725] ERROR palt.c:1234:prodAction()
> Couldn't process product: feedtype=WMO, pattern="SFPA41", action=pipe,
> args="-flush ltgdecode -e -p 300 -A -N -o
> /data/obs/ascii/ltg/lr/%%Y%%m%%d_%%H%%M.ltg_lr"

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: XNJ-465270
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.