Re: HDF5 bitfields...

Hi Quincey:

A question has come up about how a "time" data type is stored.

The "Object Modification Date and Time" messsage uses a 4-byte int, meaning "#secs since 1970". Is that the same as the "time" data type?
From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 15 2004 Jul -0600 09:38:27
Message-ID: <wrxfz7te0wc.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 15 Jul 2004 09:38:27 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
In-Reply-To: <200407150418.i6F4I8sD072078@xxxxxxxxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: Re: HDF5 bitfields...
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id i6FFcTST001660
        for netcdf-hdf-out; Thu, 15 Jul 2004 09:38:29 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id i6FFcSaW001656
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Thu, 15 Jul 2004 09:38:28 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200407151538.i6FFcSaW001656
References: <200407150418.i6F4I8sD072078@xxxxxxxxxxxxxxxxxxxxxx>
Lines: 74
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk
Reply-To: netcdf-hdf@xxxxxxxxxxxxxxxx

Quincey Koziol <koziol@xxxxxxxxxxxxx> writes:

Hi Ed,
    Bitfields are a black sheep in the datatype family and aren't terribly
well documented (which we're trying to work on).  Say something if you think
we've got a terrible gap about them somewhere.

Well I know terrible gap about them in my brain...


> Is there an example somewhere about using bitfields in HDF5?
    Hmm, you can look in the test/dtypes.c for some examples of using them.
Search for "H5T_STD_B"...


OK, here's what I'm seeing about creating a bitfield...

   hid_t                st=-1, dt=-1;
   st = H5Tcopy(H5T_STD_B16LE);
   H5Tset_precision(st, 12);
   H5Tset_offset(st, 2);

Does this pretty much sum it up? I H5TCopy an integer type big enough
to hold it, and then set precision and offset?

> Or can you just tell me what functions would be used to create a
> bitfield?
    The H5Tset_precision() routine determines the number of bits in a datatype
that are significant within it.

> Limits on number of bits?
    Up to the size of the datatype that contains it (which is defined for up
to 64-bit datatypes currently).

> How are these stored then? Any sort of padding or what?
    We currently don't pack them, so a 13-bit field in a 32-bit datatype still
takes up 4 bytes of space.  Frankly, I think this is a bit of a bug, but
it's a fairly complicated problem to pack the bits on disk (in light of using
bitfields in compound, array and variable-length datatypes mostly) and noone
has whined strongly about it, so its been the status quo for a while now. :-/

Ah ha! That sounds important.

I think storage (and transmission) efficiency is what this whole
feature is about for Russ...

Russ, is that correct? The goal here is to store and move large
amounts of bitfield data efficiently?

Otherwise, what is the point of a bitfield in C/C++ or fortran 77? I
don't know about F90 - does it have a good way to deal with bitfields?

Perhaps we should ask whether compression is a better thing to use
to achieve storage efficiency?

Ed



















From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 15 2004 Jul -0600 09:45:19
Message-ID: <wrxbrihe0kw.fsf_-_@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 15 Jul 2004 09:45:19 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
In-Reply-To: <20040715090628.A3034@xxxxxxxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: Strings (was: Re: HDF5 bitfields...)
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id i6FFjL99002371
        for netcdf-hdf-out; Thu, 15 Jul 2004 09:45:21 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id i6FFjJaW002367
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Thu, 15 Jul 2004 09:45:19 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200407151545.i6FFjJaW002367
References: <wrxr7regqm9.fsf_-_@xxxxxxxxxxxxxxxxxxxxxxx>
        <200407150418.i6F4I8sD072078@xxxxxxxxxxxxxxxxxxxxxx>
        <20040715090628.A3034@xxxxxxxxxxxxxxxxxxxxx>
Lines: 24
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk
Reply-To: netcdf-hdf@xxxxxxxxxxxxxxxx

"Robert E. McGrath" <mcgrath@xxxxxxxxxxxxx> writes:

You may also wish to refer to the H5 user's guide,

   http://hdf.ncsa.uiuc.edu/HDF5/doc/UG/

and there is a tutorial program at:

   http://hdf.ncsa.uiuc.edu/training/other-ex5/sample-programs/StringOpaque.html


Excellent! Looks to me like we can use the H5T_NATIVE_CHAR type for
strings and HDF5 will provide everything we need.
One question: you guys can handle n-dimensional arrays of strings -
for example a 3 dimensional array of strings? And use the space
manipulation functions to define the dimensions, all in the usual HDF5
way?

Thanks,

Ed



From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 15 2004 Jul -0600 09:53:24
Message-ID: <wrx3c3te07f.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 15 Jul 2004 09:53:24 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
In-Reply-To: <200407151400.i6FE0NaW021474@xxxxxxxxxxxxxxxx>
To: Russ Rew <russ@xxxxxxxxxxxxxxxx>, netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: fortran 77 and netCDF-4 (was: Re: new data types...)
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id i6FFrUnY003111
        for netcdf-hdf-out; Thu, 15 Jul 2004 09:53:30 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id i6FFrOaW003103;
        Thu, 15 Jul 2004 09:53:25 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200407151553.i6FFrOaW003103
Cc: caron@xxxxxxxx
References: <200407151400.i6FE0NaW021474@xxxxxxxxxxxxxxxx>
Lines: 16
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk
Reply-To: netcdf-hdf@xxxxxxxxxxxxxxxx


Howdy all!

We've already said something about how netcdf-4 "wouldn't work" in
F77, but what exactly did we mean by that?
I know that we meant that the compound type wouldn't work.

In Philly, many years ago, I removed the license plates from my '68
Ford wagon when a homeless guy started living in it.
Is that what we're going to do to the F77 interface?

Or, will we maintain as many features as we can in F77?

Ed

From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 15 2004 Jul -0600 09:57:21
Message-ID: <wrxzn61clge.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 15 Jul 2004 09:57:21 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
In-Reply-To: <200407151400.i6FE0NaW021474@xxxxxxxxxxxxxxxx>
To: Russ Rew <russ@xxxxxxxxxxxxxxxx>, caron@xxxxxxxxxxxxxxxx,
Subject: the API complexity cost of adding new types (was: Re: new data 
types...)
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id i6FFvWQK003562
        for netcdf-hdf-out; Thu, 15 Jul 2004 09:57:32 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id i6FFvMaW003545;
        Thu, 15 Jul 2004 09:57:22 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200407151557.i6FFvMaW003545
       netcdf-hdf@xxxxxxxxxxxxxxxx
References: <200407151400.i6FE0NaW021474@xxxxxxxxxxxxxxxx>
Lines: 37
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk
Reply-To: netcdf-hdf@xxxxxxxxxxxxxxxx


Fellow Programmers:

It hadn't occurred to me before yesterday that for every type we add,
we are going to have to add:

nc_get/put_att_<TYPE>
nc_get/put_var1_<TYPE>
nc_get/put_vara_<TYPE>
nc_get/put_vars_<TYPE>
nc_get/put_varm_<TYPE>
nc_get/put_var_<TYPE>

Six new functions. (Whereas the v2 interface can deal with any new
types we add with zero new functions.)

I'm not complaining about my fingers getting tired from typing all
this in. Just pointing out that if we add time, long longs, string,
vlen, bitfield, and compound, (a fairly minimum list) that's 36 new
functions.

Also a default fill value, but that's one one #define per type.

Do we want to consider dropping some of these (varm, I'm looking at
you!)

Am I going down the wrong track here entirely? After all, what's the
point of having a type-safe functions?

Are we going to provide new struct for each new type, for example, an
nc_time struct, in our C header file? This seems to be implied, so
that the function prototype can be something like:

EXTERNL int
nc_get_var_nctime(int ncid, int varid, nc_time *tp);

Ed

From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 15 2004 Jul -0600 09:59:31
Message-ID: <wrxvfgpclcs.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 15 Jul 2004 09:59:31 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: how does HDF5 really store time? As float, int, string?
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id i6FFxXxV003628
        for netcdf-hdf-out; Thu, 15 Jul 2004 09:59:33 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id i6FFxWaW003624
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Thu, 15 Jul 2004 09:59:32 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200407151559.i6FFxWaW003624
Lines: 9
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk
Reply-To: netcdf-hdf@xxxxxxxxxxxxxxxx


HDF5 folk:

How do you really store the time type?

As a floating point number, number of seconds since Jan 1, 1970, or
something like that?

Ed

From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 15 2004 Jul -0600 13:18:53
Message-ID: <wrxllhlcc4i.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 15 Jul 2004 13:18:53 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: parallel I/O and netCDF-4
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id i6FJIt6x026667
        for netcdf-hdf-out; Thu, 15 Jul 2004 13:18:55 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id i6FJIraW026663
        for <netcdf-hdf@unidata>; Thu, 15 Jul 2004 13:18:54 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200407151918.i6FJIraW026663
Lines: 28
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk
Reply-To: netcdf-hdf@xxxxxxxxxxxxxxxx

Howdy all!

Here's what we have in terms of requirements for Parallel I/O:

Parallel I/O

   * Parallel I/O reading and writing to netCDF file is supported.
   * The parallel I/O features require that the MPI library be
installed.
I think we can all agree that this is a model of terseness!

What does it mean to support parallel I/O to a file for reads and
writes? Feel free to lecture on this topic if anyone is feeling
loquacious.

For reading, what does this mean to the API, if anything?
Everyone gets to open the file read-only, and read from it to their
heart's content, confident that they are getting the most recent data
at that moment. That requires no API changes.

Is that it for readers? Or do they get some special additional
features, like notification of data arrival, etc?

Ed