[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #XZH-434392]: Files of different sizes (but same name) de-duping?



Hi Anne,

re:
> I saw something odd today. Two files from 2 different systems are
> de-duping, even though their sizes are totally different.
> 
> Here is the first file coming into server #1:
> 20180928T191351.525396Z 10.XX.XX.22[50524] INFO
> DownHelp.c:163:dh_saveDataProduct()  *412930408 *20180928191239.446464 EXP
> 000
> OR_ABI-L1b-RadF-M3C02_G16_s20182711900367_e20182711911134_c20182711911169.nc
> 
> Then server #1 sends to server #2:
> 20180928T191408.756221Z grbvirt-cprk.ncep.noaa.gov[416] INFO
> DownHelp.c:163:dh_saveDataProduct()  *412930408 *20180928191239.446464 EXP
> 000
> OR_ABI-L1b-RadF-M3C02_G16_s20182711900367_e20182711911134_c20182711911169.nc
> 
> Then a few minutes later, server #2 gets a new file with the same filename
> but about 3x larger from a different upstream server, and labels it as a
> duplicate:
> 20180928T191634.971464Z 10.XX.XX.42[415] INFO
> DownHelp.c:198:dh_saveDataProduct() hereis: duplicate: *1412286372
> *20180928191306.610828
> EXP 000
> OR_ABI-L1b-RadF-M3C02_G16_s20182711900367_e20182711911134_c20182711911169.nc
> 
> Is there something in the LDM that might be causing these to de-dupe even
> though they're definitely different sizes?

Yes.  If the site(s) that are inserting the GOES-16 products into their LDM 
queues
for distribution to dowstreams are using, for instance, the name of the
product (which looks like the name of the file created by something like
CSPP GEO) to create the LDM/IDD Product MD5 signature, then the signature will
be the same even when the contents of two products with the same name
are different.

We, in fact, use the fully qualified file name created by CSPP GEO on our
GOES-16 and GOES-17 GRB ingest systems to calculate the MD5 signatures, and
not the contents of the products.  We do this to eliminate a site processing
the same image more than once since the our distributing LDMs will reject
the second product received as being a duplicate.  This duplicate product
detection and rejection is done solely using the products MD5 signatures.

FYI:

Our GOES-16/17 ingest setup is undoubtedly different from that being used
in NOAA:

- we run one instance of CSPP GEO for the stream coming directly from our
  GOES-16 Quorum GRB 200 receiver, and we run a second instance of CSPP GEO
  that is being fed a stream that is being put together at/by UW/SSEC and is
  a blend of the good packets from both our and their downlinks

  This approach, which was pioneered by SSEC, eliminates the twice per
  year outages that WILL be caused by/at every downlink during
  periods of solar interference (SI), and goes a LONG way toward eliminating
  errors caused by local, terrestrial interference (TI).  It totally
  eliminates local TI if the local TI at one site is uncorrelated with
  the local TI at the other site.

  This setup has been running in an experimental mode at SSEC and Unidata
  since the early spring of 2018, and the result has been an almost complete
  elimination of "bad" packets in the blended feed.  I.e., our images are
  essentially perfect.

I'm telling you about the feed blending experimental development for two
reasons:

- the first related directly to the question you posed in your inquiry:

  The images created by CSPP GEO from the stream coming directly from
  our GRB 200 receiver are more prone to having errors (which manifest
  themselves in missing tiles) than are the images created on the 
  ingest machine getting the blended feed.  If we were to use the contents
  of the images to calculate the MD5 signature for products inserted into
  the IDD, then end users would end up getting two products for each
  coverage since images from both ingest systems are fed into the IDD.
  In order to maximize the probability that images from the blended feed
  make it into the IDD in preference to the ones from our direct feed,
  we also favor the images produced from the blended feed by delaying
  the images from our local feed by 20 seconds. This procedure has
  gone a long way towards insuring that sites feeding GOES-16 (and GOES-17,
  but that is still not generally available) from the IDD get near perfect
  to perfect images.

- SSEC is looking to get some funding to finish their development and
  documentation of their blending code, and I believe that they are
  very interested in getting additional sites with GOES-16 and GOES-17
  downlinks participating in the blending experiment

  It is my opinion (one that I talked to Carissa about at length while
  she was in Boulder recently) that getting NOAA to participate in the
  blending experiment would really help in the effort to harden the
  code that is doing the blending, and in adding additional redundancy
  to the blending effort.

  Final comment: the need to blend the output from multiple GOES-R/S
  ingest sites will become even more important in the future when
  LTE cell services expand into frequencies that are proximate to
  that being used for GRB downlinks.

I realize that the above rambled an awful lot, but I hope it was useful
nonetheless.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: XZH-434392
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.



NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.