[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20011211: SDI ingest



Jerry,

I believe you are the contact person for SDI related questions.
If this is not correct, please let me know.

I've been looking at some errors in our NOAAPORT/SDI ingestion
and have tracked down a possible bug in the "inge" software.

In splitting the data stream to the 2 separate FIFOs for
ingestion with pqing, I have found that we are seeing
occaisional products apparently truncated on the binary (HDS) FIFO,
while the missing bytes are being written to the text product
(DDPLUS) FIFO.

For example, our 2 pqing processes logs the error messages:

pqing [DDPLUS|IDS]     Lone ETX error  29         -01  777700      111614

concurrent with

pqing [HDS]            YDUC98 KWBE 111200 PAA: no end of product
                       Not a WMO GRIB format message.      85106         003  
YDUC98 KWBE 111200 PAA



The pqing reading from the text (DDPLUS|IDS) FIFO is complaining that it found a
lone ETX character. The binary (HDS) pqing is complaining that it did not find
the expected end of GRIB "7777" sequencein the appropriate place.

Further checking using the "pqing -r rawfile" option to store the raw incoming
data from each FIFO shows that the (approximately) last 29 bytes of the GRIB
product in the binary FIFO are missing. And, concurrently, they are appearing
in the text FIFO.


Here is a specific instance from Dec 7:

ldmd.log entries for ZDUK98 KWBE 071800 PAA (sequence number 692)
Dec 07 21:07:44 desi.unidata.ucar.edu pqing[6841]: scan_wmo: length 9 too short
Dec 07 21:07:44 desi.unidata.ucar.edu pqing[6841]: Lone ETX error 14         
-01  777700      072107
Dec 07 21:07:44 desi.unidata.ucar.edu pqing[6842]: ZDUK98 KWBE 071800 PAA: no 
end of product
Dec 07 21:07:44 desi.unidata.ucar.edu pqing[6842]: Not a WMO GRIB format 
message.      84720         692  ZDUK98 KWBE 071800 PAA

In the pqing rawfile that comes from the text FIFO, starting at 30332615, you
see the 29 bytes: starting with \0 006 L 231 and ending with 7 7 7 7 \r \r \n
003:

30332420       F   R   I   D   A   Y       L   E   S   S       T   H   A
30332440   N       1   5       M   P   H   .  \r  \r  \n  \r  \r  \n   .
30332460   O   U   T   L   O   O   K       8       T   O       1   4
30332500   D   A   Y   .   .   .  \r  \r  \n   T   E   M   P   E   R   A
30332520   T   U   R   E       A   B   O   V   E       N   O   R   M   A
30332540   L   .       P   R   E   C   I   P   I   T   A   T   I   O   N
30332560       N   E   A   R       N   O   R   M   A   L   .  \r  \r  \n
30332600  \r  \r  \n  \r  \r  \n  \r  \r  \n  \r  \r  \n 003  \0 006   L
30332620 231  \0   d 001 223      \0  \0   2   d  \0 003      \0  \0  \0
30332640  \0  \0   7   7   7   7  \r  \r  \n 003 001  \r  \r  \n   7   0
30332660   1      \r  \r  \n   S   X   U   S   7   0       K   W   A   L
30332700       0   7   2   1   0   4  \r  \r  \n 036   7   5   0   0   8


I have obtained the the complete ZDUK98 KWBE product from an independent
ingestion of NOAAPORT output which I have attatched to this message where you
see that the product does in fact end with the 29 bytes that are found
from the text FIFO (see below starting at octal address 0242213:
0242120   2   d 310  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0242140  \0  \0  \0  \0  \0  \0  \0 311 223   &   L 231   2   d 310  \0
0242160  \0  \0  \0  \0  \0  \0  \0  \0  \0 031  \0  \0  \0  \0 006   L
0242200 200   2   d 310  \0 006   L 231   2   d 310  \0 006   L 231  \0
0242220   d 001 223      \0  \0   2   d  \0 003      \0  \0  \0  \0  \0
0242240   7   7   7   7  \r  \r  \n 003

The appearance of these bytes on the text FIFO causes the error messages when
the 9 bytes from 30332623 to 30332633 are found (too short to be a WMO product)
and the unexpected ETX (003) at 30332650. Both these error messages are shown in
the pwing log for process 6841 shown above. At the instance, the binary pqing
process 6842 is complaining that it didn't find the GRIB special characters 7777
in the expected place in the binary stream since those byes were omitted from
the binary FIFO. Instead, the binary pqing keeps reading until it finds a \r \r
\n ETX sequence (from the next product), and the GRIB length check results in
the log message. For other products that are not GRIB, there is no such check
available, and so these corrupt products may be inserted into the outgoing IDD
data stream.

Since I don't have access to how the SDI code is behaing in splitting the input
NOAAPORT stream into the 2 separate FIFOs, I can't tell where that error
is occuring- but it seems clear to me that the SDI inge process is obtaining
the complete NOAAPORT product, and the problem lies in the byte length that
the code is using to write the data into the appropriate FIFOs.

It seems that this is not a random occurrance. But that several products
in particular are likely to trigger the fault. For example, using the
ZDUK98 KWBE header I mentioned above. you see that theproblem
occurs with each model run (there are actually 3 different ETA products
with the ZDUK98 header which are defferentiatable from the data within the GRIB
message):

/local/ldm/logs/ldmd.log:Dec 11 04:09:33 desi.unidata.ucar.edu pqing[28129]:
ZDUK98 KWBE 110000 PAA: no end of product
/local/ldm/logs/ldmd.log:Dec 11 04:09:33 desi.unidata.ucar.edu pqing[28129]: Not
a WMO GRIB format message.     166195         751  ZDUK98 KWBE 110000 PAA
/local/ldm/logs/ldmd.log:Dec 11 09:08:17 desi.unidata.ucar.edu pqing[28129]:
ZDUK98 KWBE 110600 PAA: no end of product
/local/ldm/logs/ldmd.log:Dec 11 09:08:17 desi.unidata.ucar.edu pqing[28129]: Not
a WMO GRIB format message.      88773         162  ZDUK98 KWBE 110600 PAA
/local/ldm/logs/ldmd.log:Dec 11 16:16:11 desi.unidata.ucar.edu pqing[28129]:
ZDUK98 KWBE 111200 PAA: no end of product
/local/ldm/logs/ldmd.log:Dec 11 16:16:11 desi.unidata.ucar.edu pqing[28129]: Not
a WMO GRIB format message.      93347         050  ZDUK98 KWBE 111200 PAA
/local/ldm/logs/ldmd.log.1:Dec 10 22:27:52 desi.unidata.ucar.edu pqing[28129]:
ZDUK98 KWBE 101800 PAA: no end of product
/local/ldm/logs/ldmd.log.1:Dec 10 22:27:52 desi.unidata.ucar.edu pqing[28129]:
Not a WMO GRIB format message.      92543         742  ZDUK98 KWBE 101800 PAA
/local/ldm/logs/ldmd.log.4:Dec 10 04:06:17 desi.unidata.ucar.edu pqing[8039]:
ZDUK98 KWBE 100000 PAA: no end of product
/local/ldm/logs/ldmd.log.4:Dec 10 04:06:17 desi.unidata.ucar.edu pqing[8039]:
Not a WMO GRIB format message.      85563         306  ZDUK98 KWBE 100000 PAA
/local/ldm/logs/ldmd.log.4:Dec 10 04:06:34 desi.unidata.ucar.edu pqing[8039]:
ZDUK98 KWBE 100000 PAA: no end of product
/local/ldm/logs/ldmd.log.4:Dec 10 04:06:34 desi.unidata.ucar.edu pqing[8039]:
Not a WMO GRIB format message.      87240         591  ZDUK98 KWBE 100000 PAA
/local/ldm/logs/ldmd.log.4:Dec 10 04:06:34 desi.unidata.ucar.edu pqing[8039]:
ZDUK98 KWBE 100000 PAA: no end of product
/local/ldm/logs/ldmd.log.4:Dec 10 04:06:34 desi.unidata.ucar.edu pqing[8039]:
Not a WMO GRIB format message.      88443         644  ZDUK98 KWBE 100000 PAA
/local/ldm/logs/ldmd.log.4:Dec 10 09:09:46 desi.unidata.ucar.edu pqing[8039]:
ZDUK98 KWBE 100600 PAA: no end of product
/local/ldm/logs/ldmd.log.4:Dec 10 09:09:46 desi.unidata.ucar.edu pqing[8039]:
Not a WMO GRIB format message.      95711         255  ZDUK98 KWBE 100600 PAA
/local/ldm/logs/ldmd.log.4:Dec 10 16:39:03 desi.unidata.ucar.edu pqing[8039]:
ZDUK98 KWBE 101200 PAA: no end of product
/local/ldm/logs/ldmd.log.4:Dec 10 16:39:03 desi.unidata.ucar.edu pqing[8039]:
Not a WMO GRIB format message.      84716         614  ZDUK98 KWBE 101200 PAA

By attatching the correct product in question, I am hoping that
it will help you in locating and fixing the problem which would appear to be
within the inge process.

I can provide the complete logs and rawfiles of the above if necessary.

Steve Chiswell
Unidata User SUpport

Attachment: zduk98_kwbe_071800_paa.692
Description: ZDUK98