[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #LFI-210024]: LDM ingesting some products ~30 minutes late and missing others.



Gregory,

> I've looked at the LDM support forums for the "pq_sequence" entry and
> nothing jumped out.  So I'm requesting your suggestions as to what may
> cause ~30 minute delays in receipt of products or missing products (i.e.
> for missing products I'm guessing there won't be anything in the ldmd.log
> file, but I never see them written to the local file system, but I see them
> inserted into the upstream LDM queue via notifyme).
> 
> SPC is ingesting MRMS data from the NWS/NCEP/NCO IDP MRMS LDM server and
> many of the products are ingesting timely (e.g. ?"nids"), but the grib2
> products we are interested in are very late or missing.  I have multiple
> requests in the ldmd.conf for the datasets.  The local server running LDM
> has an accurate clock with NIST.
> 
> I'm doing a notifyme on the upstream IDP MRMS server and watching the
> products being inserted into the queue timely
> ? (i.e. based on notifyme timestamp and filename timestamp)?
> .  I'm also logging verbose to the local ldmd.log file.
> ?An example of a greatly delayed product:
> 
> 
> notifyme -vl- -h mrms-ldmout.ncep.noaa.gov -f EXP -p ".*
> *MESH_Max_60min_00.50*.*"
> 
> Apr 19 *18:33:34 *notifyme[19128] INFO:    24826 20170419183334.044     EXP 
> 000  
> /nfsdata/realtime/outgoing/grib2/CONUS/*MRMS_MESH_Max_60min_00.50_20170419-183231.grib2.gz*
> 
> from the ldmd.log file, the *product arrives 1404 seconds later* (~23 
> minutes), 18:33Z + 23 *18:26Z*
> 
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG: pq_sequence():
> time(insert)-time(create): *1404.5958 s*
> Apr 19 *18:56:58* rhesrv12 pqact[5138] INFO:
> ad0069f184e9a91e1ca343abee3fc86b
> 24826 20170419183334.044     EXP 000  /nfsdata/realtime/outgoing/gri
> b2/CONUS/*MRMS_MESH_Max_60min_00.50_20170419-183231.grib2.gz*
> Apr 19 18:56:58 rhesrv12 pqact[5138] INFO:                pipe:
> -close#011w2gunzip.sh /NAWIPSBAK/ldmbakng/data/radar/grib2/mrms/conus/
> *MESH_Max_60min_00.50_20170419-183231.grib2*
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:     pipe_open: 6 14465
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:     pipe_prodput: 6
> /nfsdata/realtime/outgoing/grib2/CONUS/MRMS_MESH_Max_60min_
> 00.50_20170419-183231.grib2.gz
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:     pipe_put: 6
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:         pbuf_flush fd 6   4096
> block
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:          pbuf_flush: wrote
> 4096 bytes
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:         pbuf_flush fd 6   4096
> block
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:          pbuf_flush: wrote
> 4096 bytes
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:         pbuf_flush fd 6   4096
> block
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:          pbuf_flush: wrote
> 4096 bytes
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:         pbuf_flush fd 6   4096
> block
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:          pbuf_flush: wrote
> 4096 bytes
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:         pbuf_flush fd 6   4096
> block
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:          pbuf_flush: wrote
> 4096 bytes
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:         pbuf_flush fd 6   4096
> block
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:          pbuf_flush: wrote
> 4096 bytes
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:         pbuf_flush fd 6    250
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:          pbuf_flush: wrote  250
> bytes
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:     pipe_close: 6, 14465
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:     pipe_sync: 6
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG:         pbuf_flush fd 6      0
> block
> Apr 19 18:56:58 rhesrv12 pqact[5138] DEBUG: End of Queue
> Apr 19 *18:56:59* rhesrv12 pqact[5138] DEBUG: *Delay: 0.0001 sec*
> Apr 19 *18:56:59* rhesrv12 pqact[5138] DEBUG: pq_sequence():
> time(insert)-time(create): *1404.9679 s*

The delay between a downstream notifyme(1) indicating that a data-product
exists in an upstream LDM's product-queue and the insertion of that product
into a downstream product-queue can be caused by 1) the upstream LDM
responsible for transmitting the product being way behind; or 2) the same
data-product being received from a different upstream LDM site and the
downstream queue being too-small to reject it because the previous version
has been eliminated from the queue.

> Does CaSe sensitivity matter in the request lines?

Yes.

> ?NCEP/NCO did say this afternoon another customer was having delays and it
> 'ended up "blowing up" his ldm queues' and then the customer's latency
> problems were solved.  I'm not sure what "blowup up" an LDM queue looks
> like, I didn't ask.
> 
> pqmon output of the local LDM queue is:
> 
> [ldmbak@rhesrv12 ~/logs]$ pqmon -q /export/ldmbak/data/ldm.pq
> Apr 19 19:25:39 pqmon NOTE: Starting Up (2104)
> Apr 19 19:25:39 pqmon NOTE: nprods nfree  nempty      nbytes  maxprods
> maxfree  minempty    maxext  *age*
> Apr 19 19:25:39 pqmon NOTE:   1653     8  242479   983316984
> 20980       36    223159   6448944 *539*
> Apr 19 19:25:39 pqmon NOTE: Exiting
> [ldmbak@rhesrv12 ~/logs]$
> 
> ?The age of the oldest product in the queue doesn't look that great, 539
> seconds.  The local queue size is 1000M.

The age of the oldest product in the queue *should always* be greater than
the "/server/max-latency" parameter in the LDM registry; otherwise, the LDM
won't be able to reject a duplicate product that arrives later than the age
of the oldest product but before the maximum latency.

Assuming your using the default 1 hour maximum latency, then our queue is
way too small.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: LFI-210024
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.