[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[#DYF-938766]: Problem in TIGGE protocol for missing data

Subject: [#DYF-938766]: Problem in TIGGE protocol for missing data
Date: Mon, 12 Feb 2007 11:03:14 -0700
Baudouin.

Some suggestions I have based on using the CONDUIT and other inventory type
of messages to deciding on when to initiate your processing of the "done" file 
are:

1) You could combine your request of the done and grib*.0 as:
   request (grib.*0|done)

this would not guarantee that the done would not come before the other products 
ending in 1 through 9,
but would make its timing of arrival closer to when the other parts are 
finishing.

2) I routinely kick of processing based on either receipt of data through the 
EXEC action of LDM,
or from a cron entry that checks for receipt of data. In both these cases, I 
generally
create a .timestamp.$$ file (using the $$ process id), and then sleep for a 
specified amount of time,
then check the data directory in question to see if any data has arrived since 
I set the
time (using the "touch" command) on the timestamp file. The "find -newer" 
command is useful here where
you want to list those files newer than the timestamp file.

If additional data has arrived in the directory (or of the file names listed in 
the inventory) since the
timestamp file was last touched, I retouch the timestamp file and continue the 
sleep loop for another
interval. This way, you might assume that once either condition of:
   all data in the "done" inventory was received
or
   no additional data expected has arrived in the past 5 minues (for example)

the script can continue past the loop and request the missing data if 
neccessary.

Note that when using the EXEC line from LDM, its best to do something simple 
and exit (like touch
status file) so that LDM processing isn't waiting on something to return.I 
generally let a periodic
cron entry do the hard work of checking for the inventory, waiting for 
completion etc if it
sees that a status file exists that it needs to process.

I have some parts of cshell scripts that I can send you if you want to look at 
this type of data
processing.

Steve Chiswell
Unidata User Support





> Steve,
> 
> the problem is the following. In the upstream ldm we do:
> 
> pqinsert grib0001
> pqinsert grib0002
> ..
> pqinsert grib9999
> pqinsert done
> 
> in the downstream ldm, we subscribe to
> 
> REQUEST  grib.*0
> REQUEST  grib.*1
> REQUEST  grib.*2
> ..
> REQUEST  grib.*9
> REQUEST done
> 
> So this creates 11 parallel streams: 10 for the data, 1 for the "done"
> message.. Because each grib is large (~400K), and the line is slow, the
> "done" file will "overtake" the last few hundred gribs, because it has a
> dedicated connection just for itself.
> Therefore, although the "done" is pushed after all the gribs, it is
> received before the last few hundred gribs.
> 
> In our link with China, the problem is exacerbated because  we are
> testing with few large gribs and a  slow line. In this case the "done"
> file is received before any data ....
> 
> Baudouin
> 
> 
> Unidata IDD TIGGE Support wrote:
> > Manuel,
> >
> > [You might receive this email twice]
> >
> >
> >> We have discovered a problem in the TIGGE protocol
> >> (http://tigge.ecmwf.int/ldm_protocol.html) while exchanging data with
> >> CMA. At the moment, CMA sends very few products (1920), which
> >> exacerbates this problem:
> >>
> >> Since we have different REQUEST lines (=parallel transfers) for data and
> >> for 'protocol' messages, e.g.:
> >> REQUEST ANY "z_tigge_c_babj.*\.grib:.*(00|20|40|60|80)$"
> >> tigge-ldm.cma.gov.cn
> >> REQUEST ANY "z_tigge_c_babj.*\.grib:.*(01|21|41|61|81)$"
> >> tigge-ldm.cma.gov.cn
> >> ...
> >> REQUEST ANY "z_tigge_c_babj.*\.(manifest|done)$"   tigge-ldm.cma.gov.cn
> >>
> >> It seems that after CMA pqinserts data and .done, the file .done arrives
> >> much faster. This triggers the process to check missing fields at our
> >> end. We send the file .missing asking for non-received products to be
> >> resend. Then CMA resends the data files and the .done, which again
> >> arrives much faster than the data. This ping-pong works while we receive
> >> data in between 2 .missing notifications. But if we don't receive any
> >> data, the list of missing fields would not be sent since it contains the
> >> same list of products, ie, it has the same MD5.
> >> We have exercised this problem and this is the list of .done
> >> notifications received (note that we had to force resending a .missing):
> >>
> >
> > While I'm not completely sure I understand the details of what your're
> > doing, I think you can solve your problem by adding a timestamp line
> > to the data-product that names the "missing" data.  This would make its
> > MD5 signature unique.
> >
> > Is this possible?
> >
> >
> > Regards,
> > Steve Emmerson
> >
> > Ticket Details
> > ===================
> > Ticket ID: DYF-938766
> > Department: Support IDD TIGGE
> > Priority: High
> > Status: On Hold
> >
> >
> 
> 


Ticket Details
===================
Ticket ID: DYF-938766
Department: Support IDD TIGGE
Priority: High
Status: Closed
Prev by Date: [TIGGE #OHX-952032]: LDM connection timeout
Next by Date: [TIGGE #OHX-952032]: LDM connection timeout
Previous by thread: [#DYF-938766]: Problem in TIGGE protocol for missing data
Next by thread: [TIGGE #FLI-887509]: ldmd.conf
Index(es):
- Date
- Thread