[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #DLW-993301]: NetCDF4 Format Validation and Data Integrity Check



Hi Phillip,

> I went through the support request archive and found Flecher32 Checksum
> is applied to data only, not attributes. It also shared that other
> algorithms could be applied to a portion of data and its checksum can be
> stored as an attribute. Have you come across anyone applying different
> algorithms to data or attributes, or to an entire file?


    

> I have also read that the HDF Group implemented MD5 checksum and PGP
> signature in 2005. Do you know what has actually been implemented for HDF5
> data integrity and authenticity checks? Any plan for implementation of
> Public Key enabled encryption and/or signing for NetCDF-4 environment?

At the layer of file collections, I know of two approaches:

  - The git version control system, which uses SHA1 signatures for
    every file or directory, and makes extensive use of the resulting
    hashes for a system of content-addressable storage.  For example,
    all the netCDF test files stored in Unidata's GitHub repository
    are stored with their SHA1 signatures, so can never be modified
    without making it evident that the file was changed.

  - The ZFS file system

      http://en.wikipedia.org/wiki/ZFS#Data_integrity

    which has a focus on data integrity based on SHA-256 hashes
    throughout the entire file system tree.

In addition, data integrity is handled at the TCP layer for network
transport.

I think the layer at which the above approaches are implemented is a
better design than storing whole-file checksums or hashes in netCDF or
HDF5 files.  If a netCDF file had a global "_FileHash" attribute that
stored a checksum for the entire file except for that attribute, then
the file could easily be modified, changing the value of the
"_FileHash" attribute to match the new contents, and it would pass a
check for consistency between contents and hash, without revealing the
modification.

The HDF5 Fletcher checksum for data is a well-implemented solution to
chunk-at-a-time integrity, which supports efficient update of both
data and integrity checksums.  If instead the checksum were on the
entire file, then any write to the file would be slowed immensely if
the checksum had to be kept consistent with file contents, with
slowdowns increasing as the square of the file size!

The above considerations are reasons we aren't planning to add
file-level integrity hashes to netCDF-4, but I'm open minded if you
have any compelling counter arguments or use cases ...

--Russ

> address@hidden> wrote:
> 
> >
> > Philip Lee - NOAA Affiliate,
> >
> > Your Ticket has been received, and a Unidata staff member will review it
> > and reply accordingly. Listed below are details of this new Ticket. Please
> > make sure the Ticket ID remains in the Subject: line on all correspondence
> > related to this Ticket.
> >
> >     Ticket ID: DLW-993301
> >     Subject: NetCDF4 Format Validation and Data Integrity Check
> >     Department: Support netCDF
> >     Priority: Normal
> >     Status: Open
> >
> >
> >
> > The NetCDF libraries are developed at the Unidata Program Center,
> > in Boulder, Colorado, funded primarily by the National Science Foundation.
> >
> > All support requests are handled by the development team. No dedicated
> > support staff are funded at this time. For this reason we cannot guarantee
> > response times, nor that we can resolve every support issue, although we
> > do our best to respond within 72 hours.
> >
> > It is in the nature of support requests that the same question is asked
> > many
> > times. We urge you to search the support archives for material relating to
> > your support request:
> >
> > http://www.unidata.ucar.edu/search.jsp?support&netcdf
> >
> > If you are having trouble building netCDF, please take a look at the
> > "Building NetCDF" page:
> >
> > http://www.unidata.ucar.edu/software/netcdf/docs/building.html
> >
> > or the (unfortunately somewhat out-of-date) NetCDF Build Troubleshooter
> > page:
> >
> > http://www.unidata.ucar.edu/software/netcdf/docs/troubleshoot.html
> >
> > Windows users should see the FAQ list:
> >
> > http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#windows_netcdf4_2
> >
> > Complete documentation (including a tutorial, and sample programs in C,
> > Fortran,
> > Java, and other programming languages) can be found on the netCDF
> > Documentation page:
> >
> > http://www.unidata.ucar.edu/software/netcdf/docs/
> > http://www.unidata.ucar.edu/software/netcdf/examples/programs/
> >
> > If you resolve your issue through one of these methods, please send a
> > reply to
> > this email, letting us know that you no longer need support. This will help
> > us spend more time on netCDF development.
> >
> > Best regards,
> >
> > Unidata User Support
> >
> >
> 
> 
> Cheers,
> Philip S. Lee
> NSOF #1215
> 301-817-4407 (W)
> 202-674-5104 (C)
> address@hidden
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: DLW-993301
Department: Support netCDF
Priority: Normal
Status: Closed