Re: [netcdfgroup] [Hdf-forum] Detecting netCDF versus HDF5 -- PROPOSED SOLUTIONS --REQUEST FOR COMMENTS

> But netCDF will still open files without this attribute, right?
yes.

On 4/21/2016 1:56 PM, Ed Hartnett wrote:
Howdy Dennis!

It sounds like a good solution.

But netCDF will still open files without this attribute, right?

The ability of netCDF-4 to open HDF5 files which were written by HDF5 is an important feature. It means that users can use tools that were written for netCDF on their existing HDF5 files.

Thanks,
Ed


On Thu, Apr 21, 2016 at 3:30 PM, dmh@xxxxxxxx <mailto:dmh@xxxxxxxx> <dmh@xxxxxxxx <mailto:dmh@xxxxxxxx>> wrote:

    I am in the process of adding a global attribute in the root group
    that captures both the netcdf library version and the hdf5 library
    version
    whenever a netcdf file is created. The current  form is
    _NCProperties="version=...|netcdflibversion=...|hdflibversion=..."
    Where version is the version of the _NCProperties attribute and
    the others
    are e.g. 1.8.18 or 4.4.1-rc1.
    Issues:
    1. I am open to suggestions about changing the format or adding
    info to it.
    2. Of course this attribute will not exist in files written using
    older versions
        of the netcdf library, but at least the process will have begun.
    3. This technically does not address the original issue because
    there exist
         hdf5 files  not written by netcdf that are still compatible
    with and can be
         read by netcdf. Not sure this case is important or not.
    =Dennis Heimbigner
       Unidata



    On 4/21/2016 9:33 AM, Pedro Vicente wrote:

        DETECTING HDF5 VERSUS NETCDF GENERATED FILES
        REQUEST FOR COMMENTS
        AUTHOR: Pedro Vicente

        AUDIENCE:
        1) HDF, netcdf developers,
        Ed Hartnett
        Kent Yang
        2) HDF, netcdf users, that replied to this thread
        Miller, Mark C.
        John Shalf
        3 ) netcdf tools developers
        Mary Haley  , NCL
        4) HDF, netcdf managers and sponsors
        David Pearah  , CEO HDF Group
        Ward Fisher, UCAR
        Marinelli, Daniel J. , Richard Ullmman, Christopher Lynnes, NASA
        5)
        [CF-metadata] list
        After this thread started 2 months ago, there was an
        annoucement on the [CF-metadata] mail list
        about
        "a meeting to discuss current and future netCDF-CF efforts and
        directions.
        The meeting will be held on 24-26 May 2016 in Boulder, CO, USA
        at the UCAR Center Green facility."
        This would be a good topic to put on the agenda, maybe?
        THE PROBLEM:
        Currently it is impossible to detect if an HDF5 file was
        generated by the HDF5 API or by the netCDF API.
        See previous email about the reasons why.
        WHY THIS MATTERS:
        Software applications that need to handle both netCDF and HDF5
        files cannot decide which API to use.
        This includes popular visualization tools like IDL, Matlab,
        NCL, HDF Explorer.
        SOLUTIONS PROPOSED: 2
        SOLUTION 1: Add a flag to HDF5 source
        The hdf5 format specification, listed here
        https://www.hdfgroup.org/HDF5/doc/H5.format.html
        describes a sequence of bytes in the file layout that have
        special meaning for the HDF5 API. It is common practice, when
        designing a data format,
        so leave some fields "reserved for future use".
        This solution makes use of one of these empty "reserved for
        future use" spaces to save a byte (for example) that describes
        an enumerator
        of "HDF5 compatible formats".
        An "HDF5 compatible format" is a data format that uses the
        HDF5 API at a lower level (usually hidden from the user of the
        upper API),
        and providing its own API.
        This category can still be divide in 2 formats:
        1) A "pure HDF5 compatible format". Example, NeXus
        http://www.nexusformat.org/
        NeXus just writes some metadata (attributes) on top of the
        HDF5 API, that has some special meaning for the NeXus community
        2) A "non pure HDF5 compatible format". Example, netCDF
        Here, the format adds some extra feature besides HDF5. In the
        case of netCDF, these are shared dimensions between variables.
        This sub-division between 1) and 2) is irrelevant for the
        problem and solution in question
        The solution consists of writing a different enumerator value
        on the "reserved for future use" space. For example
        Value decimal 0 (current value): This file was generated by
        the HDF5 API (meaning the HDF5 only API)
        Value decimal 1: This file was generated by the netCDF API
        (using HDF5)
        Value decimal 2: This file was generated by <put here another
        HDF5 based format>
        and so on
        The advantage of this solution is that this process involves 2
        parties: the HDF Group and the other format's organization.
        This allows the HDF Group to "keep track" of new HDF5 based
        formats. It allows to make the other format "HDF5 certified" .
        SOLUTION 2: Add some metadata to the other API on top of HDF5
        This is what Nexus uses.
        A Nexus file on creation writes several attributes on the root
        group, like "NeXus_version" and other numeric data.
        This is done using the public HDF5 API calls.
        The solution for netCDF consists of the same approach, just
        write some specific attributes, and a special netCDF API to
        write/read them.
        This solutions just requires the work of one party (the netCDF
        group)
        END OF RFC
        In reply to people that commented in the thread
        @John Shalf
        >>Perhaps NetCDF (and other higher-level APIs that are built
        on top of HDF5) should include an attribute attached
        >>to the root group that identifies the name and version of
        the API that created the file?  (adopt this as a convention)
        yes, that's one way to do it, Solution 2 above
        @Mark Miller
        >>>Hmmm. Is there any big reason NOT to try to read a netCDF
        produced HDF5 file with the native HDF5 library if someone so
        chooses?
        It's possible to read a netCDF file using HDF5, yes.
        There are 2 things that you will miss doing this:
        1) the ability to inquire about shared netCDF dimensions.
        2) the ability to read remotely with openDAP.
        Reading with HDF5 also exposes metadata that is supposed to be
        private to netCDF. See below
        >>>> And, attempting  to read an HDF5 file produced by Silo
        using just the HDF5 library (e.g. w/o Silo) is a major pain.
        This I don't understand. Why not read the Silo file with the
        Silo API?
        That's the all purpose of this issue, each higher level API on
        top of HDF5 should be able to detect "itself".
        I am not familiar with Silo, but if Silo cannot do this, then
        you have the same design flaw that netCDF has.

        >>> In a cursory look over the libsrc4 sources in netCDF
        distro, I see a few things that might give a hint a file was
        created with netCDF. . .
        >>>> First, in NC_CLASSIC_MODEL, an attribute gets attached to
        the root group named "_nc3_strict". So, the existence of an
        attribute on the root group by that name would suggest the
        HDF5 file was generated by netCDF.
        I think this is done only by the "old" netCDF3 format.
        >>>>> Also, I tested a simple case of nc_open, nc_def_dim,
        etc. nc_close to see what it produced.
        >>>> It appears to produce datasets for each 'dimension'
        defined with two attributes named "CLASS" and "NAME".
        This is because netCDF uses the HDF5 Dimension Scales API
        internally to keep track of shared dimensions. These are
        internal attributes
        of Dimension Scales. This approach would not work because an
        HDF5 only file with Dimension Scales would have the same
        attributes.

        >>>> I like John's suggestion here.
        >>>>>But, any code you add to any applications now will work
        *only* for files that were produced post-adoption of this
        convention.
        yes. there are 2 actions to take here.
        1) fix the issue for the future
        2) try to retroactively have some workaround that makes
        possible now to differentiate a HDF5/netCDF files made before
        the adopted convention
        see below

        >>>> In VisIt, we support >140 format readers. Over 20 of
        those are different variants of HDF5 files (H5part, Xdmf,
        Pixie, Silo, Samrai, netCDF, Flash, Enzo, Chombo, etc., etc.)
        >>>>When opening a file, how does VisIt figure out which
        plugin to use? In particular, how do we avoid one poorly
        written reader plugin (which may be the wrong one for a given
        file) from preventing the correct one from being found. Its
        kinda a hard problem.

        Yes, that's the problem we are trying to solve. I have to say,
        that is quick a list of HDF5 based formats there.
        >>>> Some of our discussion is captured here. . .
        http://www.visitusers.org/index.php?title=Database_Format_Detection
        I"ll check it out, thank you for the suggestions
        @Ed Hartnett
        >>>I must admit that when putting netCDF-4 together I never
        considered that someone might want to tell the difference
        between a "native" HDF5 file and a netCDF-4/HDF5 file.
        >>>>>Well, you can't think of everything.
        This is a major design flaw.
        If you are in the business of designing data file formats, one
        of the things you have to do is how to make it possible to
        identify it from the other formats.

        >>> I agree that it is not possible to canonically tell the
        difference. The netCDF-4 API does use some special attributes
        to track named dimensions,
        >>>>and to tell whether classic mode should be enforced. But
        it can easily produce files without any named dimensions, etc.
        >>>So I don't think there is any easy way to tell.
        I remember you wrote that code together with Kent Yang from
        the HDF Group.
        At the time I was with the HDF Group but unfortunately I did
        follow closely what you were doing.
        I don't remember any design document being circulated that
        explains the internals of the "how to" make the netCDF
        (classic) model of shared dimensions
        use the hierarchical group model of HDF5.
        I know this was done using the HDF5 Dimension Scales (that I
        wrote), but is there any design document that explains it?
        Maybe just some internal email exchange between you and Kent Yang?
        Kent, how are you?
        Do you remember having any design document that explains this?
        Maybe something like a unique private attribute that is
        written somewhere in the netCDF file?

        @Mary Haley, NCL
        NCL is a widely used tool that handles both netCDF and HDF5
        Mary, how are you?
        How does NCL deal with the case of reading both pure HDF5
        files and netCDF files that use HDF5?
        Would you be interested in joining a community based effort to
        deal with this, in case this is an issue for you?

        @David Pearah  , CEO HDF Group
        I volunteer to participate in the effort of this RFC together
        with the HDF Group (and netCDF Group).
        Maybe we could make a "task force" between HDF Group, netCDF
        Group and any volunteer (such as tools developers that happen
        to be in these mail lists)?
        The "task force" would have 2 tasks:
        1) make a HDF5 based convention for the future and
        2) try to retroactively salvage the current design issue of netCDF
        My phone is 217-898-9356 <tel:217-898-9356>, you are welcome
        to call in anytime.
        ----------------------
        Pedro Vicente
        pedro.vicente@xxxxxxxxxxxxxxxxxx
        <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx>
        <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx
        <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx>>
        https://twitter.com/_pedro__vicente
        http://www.space-research.org/

            ----- Original Message -----
            *From:* Miller, Mark C. <mailto:miller86@xxxxxxxx
        <mailto:miller86@xxxxxxxx>>
            *To:* HDF Users Discussion List
        <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx
        <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx>>
            *Cc:* netcdfgroup@xxxxxxxxxxxxxxxx
        <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
            <mailto:netcdfgroup@xxxxxxxxxxxxxxxx
        <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>> ; Ward Fisher
            <mailto:wfisher@xxxxxxxx <mailto:wfisher@xxxxxxxx>>
            *Sent:* Wednesday, March 02, 2016 7:07 PM
            *Subject:* Re: [Hdf-forum] Detecting netCDF versus HDF5

            I like John's suggestion here.

            But, any code you add to any applications now will work
        *only* for
            files that were produced post-adoption of this convention.

            There are probably a bazillion files out there at this
        point that
            don't follow that convention and you probably still want your
            applications to be able to read them.

            In VisIt, we support >140 format readers. Over 20 of those are
            different variants of HDF5 files (H5part, Xdmf, Pixie, Silo,
            Samrai, netCDF, Flash, Enzo, Chombo, etc., etc.) When
        opening a
            file, how does VisIt figure out which plugin to use? In
            particular, how do we avoid one poorly written reader plugin
            (which may be the wrong one for a given file) from
        preventing the
            correct one from being found. Its kinda a hard problem.

            Some of our discussion is captured here. . .

        http://www.visitusers.org/index.php?title=Database_Format_Detection

            Mark


            From: Hdf-forum <hdf-forum-bounces@xxxxxxxxxxxxxxxxxx
        <mailto:hdf-forum-bounces@xxxxxxxxxxxxxxxxxx>
            <mailto:hdf-forum-bounces@xxxxxxxxxxxxxxxxxx
        <mailto:hdf-forum-bounces@xxxxxxxxxxxxxxxxxx>>> on behalf of John
            Shalf <jshalf@xxxxxxx <mailto:jshalf@xxxxxxx>
        <mailto:jshalf@xxxxxxx <mailto:jshalf@xxxxxxx>>>
            Reply-To: HDF Users Discussion List
        <hdf-forum@xxxxxxxxxxxxxxxxxx
        <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx>
            <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx
        <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx>>>
            Date: Wednesday, March 2, 2016 1:02 PM
            To: HDF Users Discussion List
        <hdf-forum@xxxxxxxxxxxxxxxxxx
        <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx>
            <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx
        <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx>>>
            Cc: "netcdfgroup@xxxxxxxxxxxxxxxx
        <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
            <mailto:netcdfgroup@xxxxxxxxxxxxxxxx
        <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>>"
            <netcdfgroup@xxxxxxxxxxxxxxxx
        <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
            <mailto:netcdfgroup@xxxxxxxxxxxxxxxx
        <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>>>, Ward Fisher
            <wfisher@xxxxxxxx <mailto:wfisher@xxxxxxxx>
        <mailto:wfisher@xxxxxxxx <mailto:wfisher@xxxxxxxx>>>
            Subject: Re: [Hdf-forum] Detecting netCDF versus HDF5

                Perhaps NetCDF (and other higher-level APIs that are
        built on
                top of HDF5) should include an attribute attached to
        the root
                group that identifies the name and version of the API that
                created the file?  (adopt this as a convention)

                -john

                    On Mar 2, 2016, at 12:55 PM, Pedro Vicente
                    <pedro.vicente@xxxxxxxxxxxxxxxxxx
        <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx>
                    <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx
        <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx>>> wrote:
                    Hi Ward
                    As you know, Data Explorer is going to be a general
                    purpose data reader for many formats, including
        HDF5 and
                    netCDF.
                    Here
        http://www.space-research.org/
                    Regarding the handling of both HDF5 and netCDF, it
        seems
                    there is a potential issue, which is, how to tell
        if any
                    HDF5 file was saved by the HDF5 API or by the
        netCDF API?
                    It seems to me that this is not possible. Is this
        correct?
                    netCDF uses an internal function NC_check_file_type to
                    examine the first few bytes of a file, and for
        example for
                    any HDF5 file the test is
                    /* Look at the magic number */
                       /* Ignore the first byte for HDF */
                       if(magic[1] == 'H' && magic[2] == 'D' &&
        magic[3] == 'F') {
                         *filetype = FT_HDF;
                         *version = 5;
                    The problem is that this test works for any HDF5
        file and
                    for any netCDF file, which makes it impossible to tell
                    which is which.
                    Which makes it impossible for any general purpose data
                    reader to decide to use the netCDF API or the HDF5
        API.
                    I have a possible solution for this , but before
        going any
                    further, I would just like to confirm that
                    1)      Is indeed not possible
                    2)      See if you have a solid workaround for this,
                    excluding the dumb ones, for example deciding on a
                    extension .nc or .h5, or traversing the HDF5 file
        to see
                    if it's non netCDF conforming one. Yes, to further
                    complicate things, it is possible that the above
        test says
                    OK for a HDF5 file, but then the read by the
        netCDF API
                    fails because the file is a HDF5 non netCDF conformant
                    Thanks
                    ----------------------
                    Pedro Vicente
        pedro.vicente@xxxxxxxxxxxxxxxxxx
        <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx>
                    <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx
        <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx>>
        http://www.space-research.org/
        _______________________________________________
                    Hdf-forum is for HDF software users discussion.
        Hdf-forum@xxxxxxxxxxxxxxxxxx <mailto:Hdf-forum@xxxxxxxxxxxxxxxxxx>
                    <mailto:Hdf-forum@xxxxxxxxxxxxxxxxxx
        <mailto:Hdf-forum@xxxxxxxxxxxxxxxxxx>>
        
http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
                    Twitter: https://twitter.com/hdf5



                _______________________________________________
                Hdf-forum is for HDF software users discussion.
        Hdf-forum@xxxxxxxxxxxxxxxxxx
        <mailto:Hdf-forum@xxxxxxxxxxxxxxxxxx>
        <mailto:Hdf-forum@xxxxxxxxxxxxxxxxxx
        <mailto:Hdf-forum@xxxxxxxxxxxxxxxxxx>>
        
http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
                Twitter: https://twitter.com/hdf5


        ------------------------------------------------------------------------
            _______________________________________________
            Hdf-forum is for HDF software users discussion.
        Hdf-forum@xxxxxxxxxxxxxxxxxx <mailto:Hdf-forum@xxxxxxxxxxxxxxxxxx>
        http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
            Twitter: https://twitter.com/hdf5



        _______________________________________________
        netcdfgroup mailing list
        netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
        For list information or to unsubscribe,  visit:
        http://www.unidata.ucar.edu/mailing_lists/


    _______________________________________________
    netcdfgroup mailing list
    netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
    For list information or to unsubscribe,  visit:
http://www.unidata.ucar.edu/mailing_lists/




  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: