Due to the current gap in continued funding from the U.S. National Science Foundation (NSF), the NSF Unidata Program Center has temporarily paused most operations. See NSF Unidata Pause in Most Operations for details.

Re: [netcdfgroup] Content-Based Checksums of a netCDF dataset

A small note. Since the goal is equality testing rather than security,
you should be able to get by with CRC32 or CRC64 checksums.
SHA256 is overkill.
=Dennis Heimbigner
 Unidata


On 8/24/2017 12:00 PM, Willi Rath wrote:
Hi all,

I'd like to find a way to verify the contents of a given netCDF dataset across different representations on disk.  (Think of the data set being defined by its CDL code and different representations on disk being realised by different choices of format, deflation, chunking, etc. but with identical CDL.)
There are tools that compare the contents of two netCDF files: cdo's 
diff or nccmp. These tools do, however, rely on both files being present 
on the same file system and at the same time.  A hash-based approach 
calculating checksums from the contents rather than the binary 
representation of the data set would be a nice solution to the problem.
I've tried and collected all attempts made at verification of netCDF 
files in: https://github.com/willirath/netcdf-hash (The most successful 
of which circled around the possibility of including the functionality 
in `ncks` and lead to a pair of tools for calculation and verification 
of MD5 checksums of netCDF files that are stored within the files.)
There also is a demo outlining an approach digesting different 
representations of the same netCDF data set into a sha256 hash and 
storing the hex-value of this hash in global arguments in the respective 
files.
I'd be very happy about any pointers to additional ideas (or perhaps 
existing tools) solving the problem of netCDF-content verification, 
about suggestions, remarks, etc.
Cheers
Willi



  • 2017 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: