Re: [netcdfgroup] How to dump netCDF to JSON?

I think keeping hdf and netcdf most separate is the correct
solution. A couple of points.
1. A group can be a dictionary with the following keys:
     dimensions, types, attributes (group level), variables, and data.
2. Ordering matters in netcdf, so each of the group pieces
   (dimensions, etc) needs to be a list.
2. Variables have a number of unordered parts that are best
   represented as a dictionary containing:
   name, type, attributes.
3. A set of attributes could be represented as a dictionary
   with the attribute names serving as keys. But remember
   that each attribute has a number of parts: type, name, and
   a list of values.
4. In netcdf, there are several kinds of user-defined types:
   1. enumerations: an enumeration consists of a name, a basetype
      (an integer type) and a set
      of enumeration constants. Each such constant consists
      of a name and a value.
   2. compound type (a structure in C terms): consisting of a name
      and an ORDERED list of fields. Each field is a variable
      (see above).
   3. vlen type: A variable length set of instances of some
      arbitrary base type.

=Dennis Heimbigner
 Unidata

On 10/20/2016 5:50 PM, Pedro Vicente wrote:
my thought was to make a netcdfJSON, then add features to make an
hdfJSON. (and netcdfJSON would look a lot like CDL)
So a netcdfJSON file would be a valid hdfJSON file, but not the other
way around.

on better thinking , this design has the problem of netCDF having things
that HDF5 does not (named dimensions),
and HDF5 has things that netCDF does not, so it's a bit of a catch 22 ;
so maybe just keep them separate

my design method is usually a bit of specification , then a bit of code
, then when something new comes up that was not planned, go to step 1 ,
and re-write the spec, sometimes re-write the code

-Pedro



    ----- Original Message -----
    *From:* Pedro Vicente <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx>
    *To:* Chris Barker <mailto:chris.barker@xxxxxxxx>
    *Cc:* HDF Users Discussion List
    <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx> ; netCDF Mail List
    <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
    *Sent:* Thursday, October 20, 2016 7:33 PM
    *Subject:* Re: [netcdfgroup] How to dump netCDF to JSON?

    >>my thought was to make a netcdfJSON, then add features to make an
    hdfJSON. (and netcdfJSON would look a lot like CDL)
    >>So a netcdfJSON file would be a valid hdfJSON file, but not the
    other way around.

    yes, sounds like a good plan
    I''ll send you an email when I have things ready, thanks
    -Pedro

        ----- Original Message -----
        *From:* Chris Barker <mailto:chris.barker@xxxxxxxx>
        *To:* Pedro Vicente <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx>
        *Cc:* John Readey <mailto:jreadey@xxxxxxxxxxxx> ; netCDF Mail
        List <mailto:netcdfgroup@xxxxxxxxxxxxxxxx> ; HDF Users
        Discussion List <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx>
        *Sent:* Thursday, October 20, 2016 6:17 PM
        *Subject:* Re: [netcdfgroup] How to dump netCDF to JSON?



        On Thu, Oct 20, 2016 at 3:00 PM, Pedro Vicente
        <pedro.vicente@xxxxxxxxxxxxxxxxxx
        <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx>> wrote:

            __
            >>> This is making me think that we may want a spec for netcdf-json 
that would be a subset of
            the hdf-json spec.

            that is one option;
            other option is to make a JSON form of netCDF CDL ,
            completely unaware of HDF5 (just like the netCDF API is)

            
http://www.unidata.ucar.edu/software/netcdf/workshops/2011/utilities/CDL.html
            
<http://www.unidata.ucar.edu/software/netcdf/workshops/2011/utilities/CDL.html>


        yup.

        Are they mutually exclusive approaches? my thought was to make a
        netcdfJSON, then add features to make an hdfJSON. (and
        netcdfJSON would look a lot like CDL)

        So a netcdfJSON file would be a valid hdfJSON file, but not the
        other way around.

        Like a netcdf4 file is a valid hdf5 file now.

        -CHB



            with the "data" part being optional, which was one of the
            goals of my design, to transmit just metadata over the web,
            for a quick remote inspection

            -Pedro

                ----- Original Message -----
                *From:* Chris Barker <mailto:chris.barker@xxxxxxxx>
                *To:* John Readey <mailto:jreadey@xxxxxxxxxxxx>
                *Cc:* Pedro Vicente
                <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx> ; netCDF Mail
                List <mailto:netcdfgroup@xxxxxxxxxxxxxxxx> ; HDF Users
                Discussion List <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx>
                *Sent:* Thursday, October 20, 2016 4:48 PM
                *Subject:* Re: [netcdfgroup] How to dump netCDF to JSON?

                On Thu, Oct 20, 2016 at 12:02 PM, John Readey
                <jreadey@xxxxxxxxxxxx <mailto:jreadey@xxxxxxxxxxxx>> wrote:

                    So we came up with a scheme of Group, Dataset, and
                    Datatype collections with a UUID to identify each
                    object.  That way if you a reference to a specific
                    UUID, you can always access the object regardless of
                    what shenanigans may be happening with the links in
                    the file.

                    ____

                    It’s true that this makes path look ups a bit more
                    cumbersome, but it’s a more general way of specify a
                    directed graph (the HDF5 link structure) on a tree
                    (the JSON hierarchy).


                Hmm -- interesting. I hadn't realized that HDF was this
                flexible. For my part, I've only really used netcdf.

                This is making me think that we may want a spec for
                netcdf-json that would be a subset of the hdf-json spec.

                That way they can be as compatible as possible without
                "cluttering up" the netcdf spec too much.

                -CHB







                    John____

                    ____

                    *From: *Pedro Vicente
                    <pedro.vicente@xxxxxxxxxxxxxxxxxx
                    <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx>>
                    *Date: *Tuesday, October 18, 2016 at 9:37 PM
                    *To: *John Readey <jreadey@xxxxxxxxxxxx
                    <mailto:jreadey@xxxxxxxxxxxx>>, Chris Barker
                    <chris.barker@xxxxxxxx <mailto:chris.barker@xxxxxxxx>>
                    *Cc: *netCDF Mail List <netcdfgroup@xxxxxxxxxxxxxxxx
                    <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>>, HDF Users
                    Discussion List <hdf-forum@xxxxxxxxxxxxxxxxxx
                    <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx>>


                    *Subject: *Re: [netcdfgroup] How to dump netCDF to
                    JSON?____

                    ____

                    @John____

                    ____

                    >> 1.       Complete fidelity to all HDF5 features____

                    >> 2.       Support graphs that are not acyclic.____

                    ____

                    ok, understood.____

                    ____

                    In my case I needed a simple schema for a particular
                    set of files.____

                    ____

                    But why didn't you start with the official HDF5 DDL____

                    ____

                    https://support.hdfgroup.org/HDF5/doc/ddl.html
                    <https://support.hdfgroup.org/HDF5/doc/ddl.html>____

                    ____

                    and try to adapt to JSON?____

                    ____

                    Same thing for netCDF, there is already an official
                    CDL, so any JSON spec should be "identical".____

                    ____

                    ____

                    ____

                    @Chris____

                    ____

                    {
                    "dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1,
                    2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
                    }____

                    ____

                    >> * Do you need "rank"? ____

                    ____

                    sometimes a bit of redundancy is useful, to make it
                    visually clear____

                    ____

                    >> BTW, is a "dataset" in HDF the same thing as a
                    "variable" in netcdf?)____

                    ____

                    yes____

                    ____

                    >>It would be really great to have this become an 
"official" spec -- if
                    you want to get it there, you're probably going to
                    need to develop it more out in the open with a wider
                    community. These lists are the way to get that
                    started, but I suggest ____

                    >>1) put it up somewhere that people can collaborate on it, 
make
                    suggestions, capture the discussion, etc. gitHub is
                    one really nice way to do that. See, for example the
                    UGRID spec project: ____

                    ____

                    ____

                    ok, anyone interested send me an off list  email ____

                    ____

                    ____

                    -Pedro____

                    ____

                    ____

                    ____

                    ----- Original Message ----- ____

                        *From:*John Readey <mailto:jreadey@xxxxxxxxxxxx>
                        ____

                        *To:*Chris Barker <mailto:chris.barker@xxxxxxxx>
                        ; Pedro Vicente
                        <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx> ____

                        *Cc:*netCDF Mail List
                        <mailto:netcdfgroup@xxxxxxxxxxxxxxxx> ; Charlie
                        Zender <mailto:zender@xxxxxxx> ; HDF Users
                        Discussion List
                        <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx> ; David
                        Pearah <mailto:David.Pearah@xxxxxxxxxxxx> ____

                        *Sent:*Tuesday, October 18, 2016 11:15 PM____

                        *Subject:*Re: [netcdfgroup] How to dump netCDF
                        to JSON?____

                        ____

                        Hey,____

                        ____

                        The hdf5-json code is here:
                        https://github.com/HDFGroup/hdf5-json
                        <https://github.com/HDFGroup/hdf5-json> and docs
                        are here:
                        http://hdf5-json.readthedocs.io/en/latest/
                        <http://hdf5-json.readthedocs.io/en/latest/>.  ____

                        ____

                        The package is both a library of HFD5 <-> JSON
                        conversion functions and some simple scripts for
                        converting HDF5 to JSON and vice-versa.  E.g. ____

                        $ python h5tojson.py –D <hdf5-file> ____

                        outputs JSON minus the dataset data values.____

                        ____

                        While it may not be the most elegant JSON
                        schema, it’s designed with the following goals
                        in mind:____

                        1.       Complete fidelity to all HDF5 features
                        (i.e. the goal is that you should be able to
                        take any HDF5 files, convert it to JSON, convert
                        back to HDF5 and wind up with a file that is
                        semantically equivalent to what you started
                        with.____

                        2.       Support graphs that are not acyclic.
                        I.e. a group structure like <root> links with A,
                        and B.  And A and B links to C.  The output
                        should only produce one representation of C.____

                        Since NetCDF doesn’t use all these features,
                        it’s certainly possible to come up with
                        something simpler for just netCDF files.____

                        ____

                        Suggestions, feedback, and pull requests are
                        welcome!____

                        ____

                        Cheers,____

                        John____

                        ____

                        *From: *Chris Barker <chris.barker@xxxxxxxx
                        <mailto:chris.barker@xxxxxxxx>>
                        *Date: *Friday, October 14, 2016 at 12:32 PM
                        *To: *Pedro Vicente
                        <pedro.vicente@xxxxxxxxxxxxxxxxxx
                        <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx>>
                        *Cc: *netCDF Mail List
                        <netcdfgroup@xxxxxxxxxxxxxxxx
                        <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>>, Charlie
                        Zender <zender@xxxxxxx <mailto:zender@xxxxxxx>>,
                        John Readey <jreadey@xxxxxxxxxxxx
                        <mailto:jreadey@xxxxxxxxxxxx>>, HDF Users
                        Discussion List <hdf-forum@xxxxxxxxxxxxxxxxxx
                        <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx>>, David
                        Pearah <David.Pearah@xxxxxxxxxxxx
                        <mailto:David.Pearah@xxxxxxxxxxxx>>
                        *Subject: *Re: [netcdfgroup] How to dump netCDF
                        to JSON?____

                        ____

                        Pedro, ____

                        ____

                        When I first started reading this thread, I
                        thought "there should be a spec for how to
                        represent netcdf in JSON"____

                        ____

                        and then I read:____

                        ____

                            1) The specification to convert netCDF/HDF5
                            to "a" JSON format (note the "a" here)____

                        ____

                        Awesome -- that's exactly what we need -- as you
                        say there is not one way to represent netcdf
                        data in JSON, and probably far more than one
                        "obvious" way.____

                        ____

                        Without looking at your spec yet, I do think it
                        should probably look as much like CDL as
                        possible -- we are all familiar with that.____

                        ____

                            (why Python? HDF5 developer tools should be
                            all about writing in C/C++)____

                        ____

                        Because Python is an excellent language with
                        which to "drive" C/C++ libraries like HDF5 and
                        netcdf4. If I were to do this, I'd sure use
                        Python. Even if you want to get to a C++
                        implementation eventually, you'd probably
                        benefit from prototyping and working out the
                        kinks with a Python version first.____

                        ____

                        But whoever is writing the code....____

                        ____

                        ____

                            The specification is here

                            http://www.space-research.org/____

                        ____

                        Just took a quick look -- nice start. ____

                        ____

                        I've only used HDF through the netcdf4 spec, so
                        there may be richness needed that I'm missing,
                        but my first thought is to a make greater use of
                        "objects" in JSON (key-value structures, hash
                        tables, dicts in python), rather than array
                        position for heterogeneous structures. For
                        instance, you have:____

                        ____

                         a dataset____


                            {
                            "dset1" : ["dataset", "STAR_INT32", 2, [3,
                            4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
                            }____

                        ____

                        I would perhaps do that as something like:____

                        ____

                        {____

                        ...____

                        "dset1":{"object_type": "dataset",____

                                 "dtype": "INT32"____

                                 "rank": 2,____

                                 "dimensions": [3,4],____

                                 "data": [[1,2,3,4],____

                                          [5,6,7,8],____

                                          [9,10,11,12]]____

                                 }____

                        ...____

                        }____

                        ____

                        NOTES:____

                        ____

                        * I used nested arrays, rather than flattening
                        the 2-d array -- this maps nicely to things like
                        numpy arrays, for example -- not sure about the
                        C++ world. (you can flatten and un-flatten numpy
                        arrays easily, too, but this seems like a better
                        mapping to the structure) And HDF is storing
                        this all in chunks and who knows what -- so it's
                        not a direct mapping to the memory layout
                        anyway.____

                        ____

                        * Do you need "rank"? -- can't you check the
                        length of the dimensions array?____

                        ____

                        * Do you  need "object_type" -- will it always
                        be a dataset? Or you could have something like:____

                        ____

                        {____

                        ...____

                        "datasets": {"dset1": {the actual dataset
                        object},____

                                     "dset2": {another dataset object},____

                         ....____

                        } ____

                        ____

                        Then you don't need object_type or a name____

                        ____

                        ____

                        (BTW, is a "dataset" in HDF the same thing as a
                        "variable" in netcdf?)____

                        ____

                            I would like to make this some kind of
                            "official" netCDF/HDF5 JSON format for the
                            community, so I encourage anyone to read the
                            specification____

                        ____

                            If you see any flaw in the design or
                            anything in the design that you would like
                            to have change please let me know now____

                        ____

                        done :-)____

                        ____

                        It would be really great to have this become an
                        "official" spec -- if you want to get it there,
                        you're probably going to need to develop it more
                        out in the open with a wider community. These
                        lists are the way to get that started, but I
                        suggest:____

                        ____

                        1) put it up somewhere that people can
                        collaborate on it, make suggestions, capture the
                        discussion, etc. gitHub is one really nice way
                        to do that. See, for example the UGRID spec
                        project:____

                        ____


                        https://github.com/ugrid-conventions/ugrid-conventions
                        
<https://github.com/ugrid-conventions/ugrid-conventions>____

                        ____

                        (NOTE that that one got put on gitHub after
                        there was a pretty complete draft spec, so there
                        isn't THAT much discussion captured. But also
                        note that that is too bad -- there is no good
                        record of the decision process that led to the
                        spec)____

                        ____

                            At the moment it only (intentionally) uses
                            common generic features of both netCDF and
                            HDF5, which are the numeric atomic types and
                            strings.____

                        ____

                        Good plan.____

                        ____

                        -Chris____

                        ____

                        ____

                        -- ____


                        Christopher Barker, Ph.D.
                        Oceanographer

                        Emergency Response Division
                        NOAA/NOS/OR&R            (206) 526-6959
                        <tel:%28206%29%20526-6959>   voice
                        7600 Sand Point Way NE   (206) 526-6329
                        <tel:%28206%29%20526-6329>   fax
                        Seattle, WA  98115       (206) 526-6317
                        <tel:%28206%29%20526-6317>   main reception

                        Chris.Barker@xxxxxxxx
                        <mailto:Chris.Barker@xxxxxxxx>____




                --

                Christopher Barker, Ph.D.
                Oceanographer

                Emergency Response Division
                NOAA/NOS/OR&R            (206) 526-6959   voice
                7600 Sand Point Way NE   (206) 526-6329   fax
                Seattle, WA  98115       (206) 526-6317
                <tel:%28206%29%20526-6317>   main reception

                Chris.Barker@xxxxxxxx <mailto:Chris.Barker@xxxxxxxx>




        --

        Christopher Barker, Ph.D.
        Oceanographer

        Emergency Response Division
        NOAA/NOS/OR&R            (206) 526-6959   voice
        7600 Sand Point Way NE   (206) 526-6329   fax
        Seattle, WA  98115       (206) 526-6317   main reception

        Chris.Barker@xxxxxxxx <mailto:Chris.Barker@xxxxxxxx>

    ------------------------------------------------------------------------

    _______________________________________________
    NOTE: All exchanges posted to Unidata maintained email lists are
    recorded in the Unidata inquiry tracking system and made publicly
    available through the web.  Users who post to any of the lists we
    maintain are reminded to remove any personal information that they
    do not want to be made public.


    netcdfgroup mailing list
    netcdfgroup@xxxxxxxxxxxxxxxx
    For list information or to unsubscribe,  visit:
    http://www.unidata.ucar.edu/mailing_lists/



_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web.  Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.


netcdfgroup mailing list
netcdfgroup@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/




  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: