[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030806: 20030806: 20030723: Problem with LDM/NOAAport ingestor



Kevin,

The card we have is PTI (performance technologies) running
on a PC under Solaris X86 and on a Sun Sparc.

Of course, we will be testing the 55mbit dvb-s too.

Steve Chiswell


>From: "Kevin R. Tyle" <address@hidden>
>Organization: UCAR/Unidata
>Keywords: 200308062210.h76MAwLd028645

>Hi Steve,
>
>Thanks for the tip.  I'll see what computing the MD5 on the fly
>does.
>
>What kind of card do you use?  What OS?
>
>--Kevin
>
>______________________________________________________________________
>Kevin Tyle, Systems Administrator               **********************
>Dept. of Earth & Atmospheric Sciences           address@hidden
>University at Albany, ES-235                    518-442-4571 (voice)
>1400 Washington Avenue                          518-442-5825 (fax)
>Albany, NY 12222                                **********************
>______________________________________________________________________
>
>On Wed, 6 Aug 2003, Unidata Support wrote:
>
>>
>> Kevin,
>>
>> The card we are using has either 8MB or 32MB of ram on board,
>> so its a little difficult to compare perfomance.
>>
>> The bvig performance hit is calculating the MD5 for large products.
>> If you compute the MD5 along the way as you receive each 5Kb part,
>> then you will avoid the big delay in calling the MD5 computation after you
>> have the entire product (really important with 26MB images....but even
>> 100Kb products in the nwstg channel would benefit.
>>
>> Steve Chiswell
>>
>>
>> >From: "Kevin R. Tyle" <address@hidden>
>> >Organization: UCAR/Unidata
>> >Keywords: 200308061422.h76EM4Ld008336
>>
>> >Hi Steve,
>> >
>> >An update:  I rewrote the data ingestor so it reads in each
>> >frame, and sends it in the form of the ldm product structure
>> >directly to the ldm product queue using the structure as input
>> >to the pq_insert function.  I still see the same frame loss
>> >crop up, especially once the queue fills up and the "self-cleaning"
>> >process begins.  It appears that the time required for pq_insert
>> >to return delays things just long enough for the program to miss
>> >frames when the next call to "recvfrom" retrieves the next frame
>> >from the card, although the timestamps when pq_insert is enabled
>> >do not seem to show much significant change.
>> >
>> >Anything else you could recommend I check?  I have an email in
>> >to Cyclades asking if I am accessing the buffer on the card.
>> >The RAM buffer is 256K, or about 64 frames.
>> >
>> >Thanks . . .
>> >
>> >Kevin
>> >______________________________________________________________________
>> >Kevin Tyle, Systems Administrator               **********************
>> >Dept. of Earth & Atmospheric Sciences           address@hidden
>> >University at Albany, ES-235                    518-442-4571 (voice)
>> >1400 Washington Avenue                          518-442-5825 (fax)
>> >Albany, NY 12222                                **********************
>> >______________________________________________________________________
>> >On Fri, 25 Jul 2003, Kevin R. Tyle wrote:
>> >
>> >> Hi Steve,
>> >>
>> >> comments below . . .
>> >>
>> >> On Thu, 24 Jul 2003, Unidata Support wrote:
>> >>
>> >> >
>> >> > Kevin,
>> >> >
>> >> > The 4 channel system I wrote reads the incoming data, computes the
>> >> > MD5 checksum as the data streams in, and then inserts into the queue di
> rec
>> > tly.
>> >> > This avoids other processes, named pipes, and the like. It also
>> >> > allows the MD5 to be computed as the data blocks arrive, rather than
>> >> > waiting for the entire product to arrive and then have
>> >> > pqing compute the checksum...which is much more important with
>> >> > 26MB satellite images. Also, your PC card probably has a RAM
>> >> > buffer on it- so if necessary, your card will provide the
>> >> > buffer space.
>> >> >
>> >> > Some points here you may want to consider:
>> >> >
>> >> > 1) you didn't say it- but I'm assuming you are using pqing to read from
>> >> > your named pipes. It sounds like your named pipe would have to be full
>> >> > in order to drop something. Is your program checking for this
>> >> > condition? How do you handle it....or do things get dropped on the floo
> r?
>> >> >
>> >>
>> >> Yep, three separate instances of pqing are launched when the ldm starts,
>> >> and they read from the DDPLUS, HDS, and NMC3 named pipes.  At this point,
>> >> I'm not checking for a full pipe (how is this done, anyway?), but it
>> >> does seem that when the data is written to the pipe, it all gets ingested
>> >> and properly handled by pqact. The check for sequential HDLC frame number
> s
>> >> is done when the hdlc frame is read in from the cyclades card.  Based on
>> >> debugging output, it appears that the frames are missed entirely and thus
>> >> never get a chance to be processed and written to the pipe.  Running the
>> >> ingestor without the LDM, with or without output to a FIFO, shows no
>> >> frame loss.
>> >>
>> >> > 2) You generally don't want a program being dependent onsomething else 
> wit
>> > hout
>> >> > buffering. In your approach, you are loosing the benefit of the
>> >> > on board memory of the card. Your buffer in the pipe is probably limite
> d.
>> >> > One alternative is to have your program write to a cyclical file
>> >> > (buffer), and have a separate process read from the cyclical file and f
> eed
>> >  the
>> >> > FIFO....but you would still need to be checking for write errors.
>> >> >
>> >> >
>> >>
>> >> I am going to follow what I think you did by modifying our program to
>> >> directly access and write to the queue, eliminating the use of named
>> >> pipes.  My plan to do this is to incorporate the relevant parts of
>> >> pqing into the program, and we'll see how it goes.
>> >>
>> >> I believe the size of a named pipe is limited to 4096 bytes at the
>> >> kernel level.  Although each HDLC frame appears to be less than
>> >> that (at least on the NWSTG channel), a full product clearly
>> >> will exceed this often.  Although if the problem was with the pipe,
>> >> wouldn't I see the products not make it through the LDM?  I'm not
>> >> fully aware of how the named pipes work, but maybe things are delayed
>> >> while the LDM reads data out of the pipe just long enough for the
>> >> main program to lose incoming data from the card.
>> >>
>> >> Thanks for the advice . . .
>> >>
>> >> --Kevin
>> >>
>> >>
>> >>
>> >> > The LDM queue cleaning is generally efficient, pqexpire is much more co
> stl
>> > y
>> >> > since it has to search the queue. A fast machine should not be noticing
>> >> > that overhead. You probably want a larger queue anyhow, since a T-1 is 
> cap
>> > able
>> >> > of exceeding 400MB an hour.
>> >> >
>> >> > Steve Chiswell
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > >From: "Kevin R. Tyle" <address@hidden>
>> >> > >Organization: UCAR/Unidata
>> >> > >Keywords: 200307232218.h6NMI8Ld008737
>> >> >
>> >> > >Hi,
>> >> > >
>> >> > >First, this question pertains to work that I am doing as a
>> >> > >consultant for a non-Unidata member (MESO, Inc.), so I understand this
>> >> > >might not be the right place to send this, but hey, it's an
>> >> > >interesting problem.
>> >> > >
>> >> > >MESO basically did a "do-it-yourself" installation of a NOAAport
>> >> > >system.  Besides the appropriate satellite dish/EFR-54 Receiver system
> ,
>> >> > >we use a 2.6 GHz dual CPU Intel P-4 that is running RH 8.0.  A Cyclade
> s
>> >> > >PC300 card is used on the PC to receive the data from the receiver.
>> >> > >Presently, we are only ingesting data on the NCEP/NWSTG channel.
>> >> > >The PC has three 36 GB SCSI disks, and use the EXT3 logging
>> >> > >filesystem (although I have experimented placing the LDM
>> >> > >product queue on its own disk separate from the rest
>> >> > >of the LDM-related files, on a non-logging ext2 filesystem.)
>> >> > >
>> >> > >Our ingest program receives frames from the card, strips out the
>> >> > >extraneous headers, and basically puts everything into an
>> >> > >LDM-friendly format.  Depending on the WMO ID, products are
>> >> > >separated into DDPLUS, HDS, and NMC3 feeds.  The data is output
>> >> > >into three named pipes, corresponding to the three data feeds.
>> >> > >The LDM then reads from these named pipes.
>> >> > >
>> >> > >Basically, when I start the LDM, everything goes well, for a time.
>> >> > >All frames are received (we check for sequential frame #'s and
>> >> > >product ID's).  But, after a certain period of time, say an
>> >> > >hour or so, we begin to lose frames.  Sometimes a couple, sometimes
>> >> > >about 10 or so.  And once it starts, it's basically useless until
>> >> > >the ingestor and LDM are restarted.  If I run the ingestor without
>> >> > >the LDM (e.g., just cat'ing the named pipes into /dev/null), no
>> >> > >frame skipping occurs.
>> >> > >
>> >> > >I knew I was onto something when I found that when I remade the
>> >> > >queue, things would always work well for an hour or so.  I began
>> >> > >to suspect that when the queue reached it's full size, we started
>> >> > >to see the frame loss.
>> >> > >
>> >> > >Here is an example from today.  I started the ingestor at
>> >> > >1845 UTC.  All goes well for about 90 minutes.  Then, I get
>> >> > >this in the output from the ingestor:
>> >> > >
>> >> > >WMOID = SPAK32, Cat. = 1,LDM sqnm = 688, feed = DDS,Product ID # = 940
> 200
>> >> > >030723/20:22:45
>> >> > >
>> >> > >Previous Frame ID = 634, Current Frame ID = 635
>> >> > >
>> >> > >WMOID = UANT01, Cat. = 7,LDM sqnm = 689, feed = DDS,Product ID # = 940
> 201
>> >> > >030723/20:22:45
>> >> > >
>> >> > >Previous Frame ID = 635, Current Frame ID = 636
>> >> > >
>> >> > >WMOID = SDUS23, Cat. = 1,LDM sqnm = 690, feed = RAD,Product ID # = 940
> 202
>> >> > >030723/20:22:45
>> >> > >
>> >> > >Previous Frame ID = 636, Current Frame ID = 643
>> >> > >
>> >> > >*** BREAK IN FRAME # SEQUENCE!! ***
>> >> > >
>> >> > >WMOID = SDUS22, Cat. = 1,LDM sqnm = 691, feed = RAD,Product ID # = 940
> 203
>> >> > >
>> >> > >030723/20:22:45
>> >> > >
>> >> > >Previous Frame ID = 643, Current Frame ID = 644
>> >> > >030723/20:22:45
>> >> > >
>> >> > >Previous Frame ID = 644, Current Frame ID = 645
>> >> > >030723/20:22:45
>> >> > >
>> >> > >Previous Frame ID = 645, Current Frame ID = 646
>> >> > >030723/20:22:45
>> >> > >
>> >> > >Previous Frame ID = 646, Current Frame ID = 647
>> >> > >
>> >> > >WMOID = SDUS51, Cat. = 1,LDM sqnm = 692, feed = RAD,Product ID # = 940
> 205
>> >> > >
>> >> > >*** BREAK IN PRODUCT NUMBER SEQUENCE!! ***
>> >> > >
>> >> > >Now look at the pqmon output from about that time:
>> >> > >
>> >> > >Jul 23 20:22:29 lightning2 pqmon[15276]:  36025     1   61630   397335
> 184
>> >> > >59835        4     37820   2667888 3169
>> >> > >Jul 23 20:23:29 lightning2 pqmon[15276]:  36018     1   61637   398231
> 640
>> >> > >59835        4     37820   1771432 3169
>> >> > >Jul 23 20:24:29 lightning2 pqmon[15276]:  36027     1   61628   399151
> 056
>> >> > >59835        4     37820    852016 3169
>> >> > >Jul 23 20:25:29 lightning2 pqmon[15276]:  36238     1   61417   399592
> 472
>> >> > >59835        4     37820    410600 3168
>> >> > >Jul 23 20:26:29 lightning2 pqmon[15276]:  36216     1   61439   399991
> 480
>> >> > >59835        4     37820     11592 3163
>> >> > >Jul 23 20:27:29 lightning2 pqmon[15276]:  36186     1   61469   399994
> 920
>> >> > >59835        4     37820      8152 3147
>> >> > >Jul 23 20:28:29 lightning2 pqmon[15276]:  36057     1   61598   399999
> 864
>> >> > >59835        4     37820      3208 3137
>> >> > >Jul 23 20:29:30 lightning2 pqmon[15276]:  35646     1   62009   399980
> 208
>> >> > >59835        4     37820     22864 3124
>> >> > >Jul 23 20:30:30 lightning2 pqmon[15276]:  35283     1   62372   399994
> 480
>> >> > >59835        4     37820      8592 3112
>> >> > >Jul 23 20:31:30 lightning2 pqmon[15276]:  35462     1   62193   400000
> 696
>> >> > >59835        4     37820      2376 3120
>> >> > >Jul 23 20:32:30 lightning2 pqmon[15276]:  34906     1   62749   400000
> 632
>> >> > >59835        4     37820      2440 3077
>> >> > >Jul 23 20:33:30 lightning2 pqmon[15276]:  34858     1   62797   399999
> 192
>> >> > >59835        4     37820      3880 3057
>> >> > >Jul 23 20:34:30 lightning2 pqmon[15276]:  34290     1   63365   399996
> 784
>> >> > >59835        4     37820      6288 2959
>> >> > >Jul 23 20:35:30 lightning2 pqmon[15276]:  33682     1   63973   399997
> 352
>> >> > >59835        4     37820      5720 2885
>> >> > >Jul 23 20:36:30 lightning2 pqmon[15276]:  33596     1   64059   399998
> 024
>> >> > >59835        4     37820      5048 2861
>> >> > >Jul 23 20:37:30 lightning2 pqmon[15276]:  32904     1   64751   399992
> 952
>> >> > >59835        4     37820     10120 2805
>> >> > >Jul 23 20:38:30 lightning2 pqmon[15276]:  32380     1   65275   399999
> 784
>> >> > >59835        4     37820      3288 2708
>> >> > >Jul 23 20:39:30 lightning2 pqmon[15276]:  32487     1   65168   399989
> 456
>> >> > >59835        4     37820     13616 2706
>> >> > >Jul 23 20:40:30 lightning2 pqmon[15276]:  32657     1   64998   400001
> 480
>> >> > >59835        4     37820      1592 2717
>> >> > >Jul 23 20:41:30 lightning2 pqmon[15276]:  32764     0   64892   400003
> 072
>> >> > >59835        4     37820         0 2737
>> >> > >Jul 23 20:42:30 lightning2 pqmon[15276]:  32956     1   64699   399985
> 120
>> >> > >59835        4     37820     17952 2752
>> >> > >
>> >> > >The queue is just about filled up by 20:22, and that's when we see the
>> >> > >problems start.
>> >> > >
>> >> > >I experimented with running pqexpire, running it @ 30 second intervals
>> >> > >to keep only the last 30 minutes of data.  That cleared the
>> >> > >queue, but I then found that each time pqexpire ran corresponded almos
> t
>> >> > >to the second to frame loss errors in the ingestor program.
>> >> > >
>> >> > >So it seems to me that the product queue cleanup process, whether it
>> >> > >is run "automatically" in the modern LDM, or "the old way" using
>> >> > >pqexpire, slows up pqing reading from the named pipes just enough
>> >> > >so it can't keep up with the main ingestor program.  By the time
>> >> > >the data is read from the pipe, some frames have lost their
>> >> > >"window of opportunity" to get ingested.
>> >> > >
>> >> > >Any ideas as to how I might be able to solve this problem
>> >> > >would be much appreciated.  I am sure that this has to have
>> >> > >been done before by the outfits that use a Linux box to
>> >> > >ingest data via the LDM.
>> >> > >
>> >> > >For what it's worth, we have the same problem on a much older
>> >> > >PIII 600 MHz system running RH 6.1.
>> >> > >
>> >> > >Many thanks . . .
>> >> > >
>> >> > >--Kevin
>> >> > >
>> >> > >______________________________________________________________________
>> >> > >Kevin Tyle, Systems Administrator               **********************
>> >> > >Dept. of Earth & Atmospheric Sciences           address@hidden
>> >> > >University at Albany, ES-235                    518-442-4571 (voice)
>> >> > >1400 Washington Avenue                          518-442-5825 (fax)
>> >> > >Albany, NY 12222                                **********************
>> >> > >______________________________________________________________________
>> >> > >
>> >> >
>> >> > ***********************************************************************
> ***
>> >> > Unidata User Support                                    UCAR Unidata Pr
> ogr
>> >> > (303)497-8643                                                  P.O. Box
>  30
>> >> > address@hidden                                   Boulder, CO 
> 803
>> >> > -----------------------------------------------------------------------
> ---
>> >> > Unidata WWW Service              http://my.unidata.ucar.edu/content/sup
> por
>> >> > ***********************************************************************
> ***
>> >> >
>> >>
>> >
>>
>> ****************************************************************************
>> Unidata User Support                                    UCAR Unidata Program
>> (303)497-8643                                                  P.O. Box 3000
>> address@hidden                                   Boulder, CO 80307
>> ----------------------------------------------------------------------------
>> Unidata WWW Service              http://my.unidata.ucar.edu/content/support 
>> ****************************************************************************
>>
>