[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #XNJ-465270]: pbuf_flush error



Hello,

> I hand tested the "ltgdecode" program manually using a perl program to keep
> an open process pipe to process 60 files from 1400Z to 1500Z (10/23 today),
> (one for each minute of the hour).  These were the same files that were
> processed in the logs that XXX referenced;  Specifically the SFPA41
> Vaisaila GLD360 files from NOAAPORT.  The "ltgdecode" program processed all
> 60 encrypted files in around 5 seconds without hesitancy.  So, I don't
> believe this program is holding up the product in the LDM queue for 60
> seconds.   There are other things happening concurrently in the system to
> create the queue delay, but I don't know  how the "-flush" reacts to
> incoming streams of data, as opposed to "-close" which sends an EOF
> signal.  The source code is written with one big  while (!feof(stdin))
> loop, with SIGNAL interrupt processing to bail out of the loop, embedded
> within.

OK.

> NCO updated the ltgdecoded program in such a way, that you can't run files
> through it one at a time, without it seg-faulting on the hourly-bin files.
> In otherwords, with LDM you can't use "PIPE -close", without the program
> having issues.   Instead, you have keep the process open and stream data to
> it, using the "-flush" command.  The problem with this, is if you restart
> LDM, it restarts the program, and it will crash in subsequent updates to
> the hourly-bin files causing issues with the files being able to update.
> The flush timeout mechanism, also causes the program to seg-fault, and
> failures to update the hourly-bin files.  Problem fixes itself at the top
> of the next hour, with creation of new hourly files.

Whoa! This description of ltgdecode(1) violates the requirements of an LDM 
decoder. An LDM decoder *must* be able to be restarted because pqact(1), when 
it needs a file descriptor but none are available, will close the least 
recently used one in order to process another data-product. More information 
can be found at 
<https://www.unidata.ucar.edu/software/ldm/ldm-current/basics/pqact.conf.html#output-file%20limit>.

> So its invocation in the LDM pqact file is this way, keeps the process
> running indefinately.

Not necessarily. See the previous paragraph.

> HDS     SFPA41
> PIPE    -flush ltgdecode -e -p 3600 -L -N -o
> /data/obs/lw/ltg/lr/%%Y%%m%%d_%%H.ltg_lr
> 
> XXX XXX, indicated in an e-mail that they plan to release a new version
> of the ltgdecode associated with the 7.6 N-AWIPS release.  However,  I
> believe this version will allow us to use the "-close" command instead of
> "-flush".  If that is the case, maybe the 60 sec queue issue goes away.
> Will have to wait and see.  Release will be in December 18/January 2019
> time-frame.

That would be good. Better sooner than later.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: XNJ-465270
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.