[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #JXK-104385]: using PIPE with CONDUIT data

Hi Deb,

> Hey thanks for the prompt response.

No worries.  I apologize for not being able to get back to your inquiry
yesterday, some other things came up that consumed the afternoon.

> I think I am not speaking LDM well.

No worries.

> When you refer to "full GRIB2 message", I'm assuming that a full GRIB2
> message is:
> Jun 11 21:49:46 pqutil INFO:      42796 20180611214930.210 CONDUIT 320
> data/nccf/com/gfs/prod/gfs.2018061118/gfs.t18z.pgrb2.1p00.f057
> !grib2/ncep/GFS/#000/201806111800F057/PRES/0 - CCBL! 00
> 0320

Yes.  The process running at NCEP (the source of the CONDUIT feed)
carves up the output from the model into individual GRIB2 messages
and inserts those into an LDM queue and then they are distributed
to toplevel IDD relays for further distribution to end-users.

> Our ingest (with the appropriate tabs of course)
> ^data/nccf/com/gfs/prod/gfs.([12][0-9])([0-9][0-9])([01][0-9])([0-3][0-9])([0-2][0-9])/(gfs.*pgrb2.1p00.)f([0-9]*[02468])
> !(.*)!
> FILE    -close -log
> /data/gfs/gfs1.00deg/test/\1\2\3\4/gfs.\1\2\3\4\5.pgrb2.1p00.f\7
> stores all 1p00.f057 messages into a single file called
> gfs.2018061118.pgrb2.1p00.f057 which many others use.

OK.  This is the first time that you mentioned that the objective
was to store all of the GRIB2 messages for a model output time step
in a single file.  Your inquiry makes a lot more sense to me now! :-)

> Currently, those users have scripts that run at a time when it is assumed
> that all of the data for 18Z 1p00 f057 has been written to
> gfs.2018061118.pgrb2.1p00.f057.
> Obviously that assumption causes alot of reprocessing when data are missed
> for some reason.

Or, when the script is run before the full set of GRIB2 messages from the
time step have been received.

> So they asked if I could streamline that process to run
> their scripts via LDM EXEC once the entire
> gfs.2018061118.pgrb2.1p00.f057 ingest
> is complete.  And from my understanding, the answer is no.

The answer is actually yes and no, but the yes option will require more
complexity in your LDM processing.

You are correct in thinking that there is no apriori way to tell if all
of the GRIB2 messages have been received ** UNLESS ** you have a list of
all of the GRIB2 messages that were part of the original output of the model.
Luckily, this is possible with CONDUIT (but not for GRIB2 messages received
in the NGRID feed that originates from a NOAAPort downlink) since a manifest
file (a file that has a listing of all GRIB2 messages that were successfully
inserted into the originating LDM queue) is sent right after the last GRIB2
message is inserted into the originating LDM queue.  This means that you
should be able to:

- use the arrival of the manifest file as an indication that the great
  majority of GRIB2 messages have been received

  Why not as an indication that all GRIB2 messages have been received?
  The reason is that since the route that each LDM products (GRIB2
  messages in this case) goes through before arriving at your LDM is not
  fixed, so the products can be received in an order that is different
  from the order of insertion into the originating LDM's queue, and
  this, in turn means that you could receive the manifest file before
  the last GRIB2 message(s) are received.  The difference in time between
  the receipt(s), however should be "small", so you could setup an action
  for the receipt of the manifest that then waits for some time (exactly
  how much is hard to say, but a couple/few minutes should be sufficient)
  before kicking off the end-user processing OR sends a notification
  that end-user processing can run.

- compare an inventory of GRIB2 messages received against the list in the
  manifest file

  This would allow you to _know_ if you received all of the products
  (GRIB2 messages) that were made available via inserts into the LDM
  queue at the originating site.

> Hope that makes some sense.

Yes, it makes good sense to me now.

OK, so your next question is undoubtedly what the Product IDs for
the manifest files look like?

The thing you need to know about the manifest files is what their
Product IDs look like since it is the Product IDs that you must
create extended regular expressions to match and use in pattern-action
file actions.  The key piece here is that the word 'status' (no quotes)
is included in the Product ID for each manifest file.  To see what
the manifest files you are receiving in your CONDUIT feed, run
the following:

<as 'ldm' on your machine that is receiving the CONDUIT feed>

notifyme -vl- -f CONDUIT -p status -o 10000000

Example output:

20180614T160025.877556Z notifyme[1902] INFO notifyme.c:222:notifymeprog_5()     
 60300 20180614155943.449779 CONDUIT 417  
.status.data/nccf/com/gfs/prod/gfs.2018061412/gfs.t12z.pgrb2.0p50.f087 000417
20180614T160026.941924Z notifyme[1902] INFO notifyme.c:222:notifymeprog_5()     
 60303 20180614155943.503915 CONDUIT 417  
.status.data/nccf/com/gfs/prod/gfs.2018061412/gfs.t12z.pgrb2.0p25.f090 000417

The Product IDs in this listing snippit are:

.status.data/nccf/com/gfs/prod/gfs.2018061412/gfs.t12z.pgrb2.0p25.f090 000417
.status.data/nccf/com/gfs/prod/gfs.2018061412/gfs.t12z.pgrb2.0p50.f087 00041


- the LDM 'notifyme' utility is, in my opinion, THE most useful LDM command
  for end users 

  It not only can be used to show you what you are receiving/have received,
  it can be used to show you what your upstream feed host is receiving/has
  received.  It can also be used to test extended regular expressions that
  you may want to use in the LDM configuration file (~ldm/ldmd.conf) REQUEST
  lines and in pattern-action file actions.

So, your job (if you choose to accept it Mission Impossible ;-) is

- figure out which manifest files you want to use in your processing

- write a pattern action file action that does something with the
  manifest file(s) that you are interested in

  For example: compare what was received for the model output time
  step with what was inserted into the LDM queue at NCEP; send
  notifications to users that the data is ready for processing; etc.

One last comment in clarification of something I said above:

The CONDUIT feed is the only feed that contains a manifest file,
and this is possible since the process that is creating LDM products
from model output (i.e., carving up the large model output file
into individual GRIB2 messages) is also creating the manifest files.
This is not possible for feeds that originate from NOAAPort since
we have no way of knowing what the list of products actually sent
in the SBM is.

I hope that this all makes sense.  If it doesn't, please let us


Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
Unidata HomePage                       http://www.unidata.ucar.edu

Ticket Details
Ticket ID: JXK-104385
Department: Support LDM
Priority: Normal
Status: Closed
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.

NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.