[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #XZH-434392]: Files of different sizes (but same name) de-duping?



Anne,

> I saw something odd today. Two files from 2 different systems are
> de-duping, even though their sizes are totally different.
> 
> Here is the first file coming into server #1:
> 20180928T191351.525396Z 10.XX.XX.22[50524] INFO
> DownHelp.c:163:dh_saveDataProduct()  *412930408 *20180928191239.446464 EXP
> 000
> OR_ABI-L1b-RadF-M3C02_G16_s20182711900367_e20182711911134_c20182711911169.nc
> 
> Then server #1 sends to server #2:
> 20180928T191408.756221Z grbvirt-cprk.ncep.noaa.gov[416] INFO
> DownHelp.c:163:dh_saveDataProduct()  *412930408 *20180928191239.446464 EXP
> 000
> OR_ABI-L1b-RadF-M3C02_G16_s20182711900367_e20182711911134_c20182711911169.nc
> 
> Then a few minutes later, server #2 gets a new file with the same filename
> but about 3x larger from a different upstream server, and labels it as a
> duplicate:
> 20180928T191634.971464Z 10.XX.XX.42[415] INFO
> DownHelp.c:198:dh_saveDataProduct() hereis: duplicate: *1412286372
> *20180928191306.610828
> EXP 000
> OR_ABI-L1b-RadF-M3C02_G16_s20182711900367_e20182711911134_c20182711911169.nc
> 
> Is there something in the LDM that might be causing these to de-dupe even
> though they're definitely different sizes?

It sounds like whatever created those LDM data products (pqinsert(1)?) computed 
the MD5 checksum from the product-identifier (i.e., the name) rather than from 
the data. As a consequence, the product-queue library refused to insert the 
second data-product because it had the same MD5 checksum as a data-product 
already in the product-queue.

If one is going to use the product-identifier to compute the MD5 checksum, then 
one must ensure that the product-identifier is unique.

Assuming the data-product was from the GRB, then I suspect that the two places 
that created the data-product had vastly different signal reception.

If this is a GRB reception problem, then a solution does exist and we'd be 
happy to tell you about it. Contact us if you're interested.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: XZH-434392
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.