Re: HDF5 bitfields...

Quincey Koziol wrote:

Hi Ed,

   Bitfields are a black sheep in the datatype family and aren't terribly
well documented (which we're trying to work on).  Say something if you think
we've got a terrible gap about them somewhere.
Well I know terrible gap about them in my brain...
   :-)

Is there an example somewhere about using bitfields in HDF5?
   Hmm, you can look in the test/dtypes.c for some examples of using them.
Search for "H5T_STD_B"...

OK, here's what I'm seeing about creating a bitfield...

   hid_t                st=-1, dt=-1;
   st = H5Tcopy(H5T_STD_B16LE);
   H5Tset_precision(st, 12);
   H5Tset_offset(st, 2);

Does this pretty much sum it up? I H5TCopy an integer type big enough
to hold it, and then set precision and offset?
   Yes, that's pretty much all.

Or can you just tell me what functions would be used to create a
bitfield?
   The H5Tset_precision() routine determines the number of bits in a datatype
that are significant within it.

Limits on number of bits?
   Up to the size of the datatype that contains it (which is defined for up
to 64-bit datatypes currently).

How are these stored then? Any sort of padding or what?
   We currently don't pack them, so a 13-bit field in a 32-bit datatype still
takes up 4 bytes of space.  Frankly, I think this is a bit of a bug, but
it's a fairly complicated problem to pack the bits on disk (in light of using
bitfields in compound, array and variable-length datatypes mostly) and noone
has whined strongly about it, so its been the status quo for a while now. :-/
Ah ha! That sounds important.

I think storage (and transmission) efficiency is what this whole
feature is about for Russ...

Russ, is that correct? The goal here is to store and move large
amounts of bitfield data efficiently?

Otherwise, what is the point of a bitfield in C/C++ or fortran 77? I
don't know about F90 - does it have a good way to deal with bitfields?

Perhaps we should ask whether compression is a better thing to use
to achieve storage efficiency?
   It would be fairly straightforward to implement a pipeline filter that
"compressed" data by packing out the unused bits for bitfield datatypes.  (At
least for non-compound/array/variable-length combinations :-).

   Quincey
the motivation for me would be to use bit-packing as a storage format, not a data type. we would add an option to pack wider types (usually float/double) using a scale/offset. this can get you a factor of 2-4 or so, whereas compression may not get you anything.

however, this would only work if it remains a valid hdf5 file. It would be most useful if we can do arbitrary bit widths, but still useful if we are limited to multiples of 8.
From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 16 2004 Jul -0600 10:22:24
Message-ID: <wrxllhjrkfz.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 16 Jul 2004 10:22:24 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
In-Reply-To: <20040715150435.G3034@xxxxxxxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: Re: questions about compression...
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id i6GGMP89009023
        for netcdf-hdf-out; Fri, 16 Jul 2004 10:22:25 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id i6GGMOaW009019
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Fri, 16 Jul 2004 10:22:24 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200407161622.i6GGMOaW009019
References: <wrxd62xcb1b.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
        <20040715150435.G3034@xxxxxxxxxxxxxxxxxxxxx>
Lines: 31
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk
Reply-To: netcdf-hdf@xxxxxxxxxxxxxxxx

"Robert E. McGrath" <mcgrath@xxxxxxxxxxxxx> writes:

Please check the Users Guide (chapter on 'datasets').

http://hdf.ncsa.uiuc.edu/HDF5/doc/UG/


Basically, there is a set/get pair for all the filters. The standard
filters are:  Deflate (GZIP), SZIP compression, Shuffle, and Fletcher
Error
Detection Code.

To enable, you do a H5Pset_... on the Dataset Creation Property list,
then
create the dataset with H5Dcreate.

OK, then let me pose the following requirements' question:

Is the requirement that we support one type of compression, both types
of compression that currently exist in the library (gzip and szip), or
that we support all compression filters that may be introduced in the
future?

Or is the requirement that we support file filters, including all the
ones listed above?

If yes to the last question, is it also a requirement that we allow
the user to register callbacks, etc., and so add his own filters to
netCDF-4, just as HDF5 does?

Ed

From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 16 2004 Jul -0600 10:33:12
Message-ID: <wrx8ydjrjxz.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 16 Jul 2004 10:33:12 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
In-Reply-To: <40F7FFE1.80302@xxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: Re: HDF5 bitfields...
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id i6GGXFQ6010100
        for netcdf-hdf-out; Fri, 16 Jul 2004 10:33:15 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id i6GGXDaW010093
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Fri, 16 Jul 2004 10:33:13 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200407161633.i6GGXDaW010093
References: <200407160402.i6G42gaU005048@xxxxxxxxxxxxxxxxxxxxxx>
        <40F7FFE1.80302@xxxxxxxxxxxxxxxx>
Lines: 21
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk
Reply-To: netcdf-hdf@xxxxxxxxxxxxxxxx

John Caron <caron@xxxxxxxxxxxxxxxx> writes:

the motivation for me would be to use bit-packing as a storage format,
not a data type. we would add an option to pack wider types (usually
float/double) using a scale/offset. this can get you a factor of  2-4
or so, whereas compression may not get you anything.

however, this would only work if it remains a valid hdf5 file. It
would be most useful if we can do arbitrary bit widths, but still
useful if we are limited to multiples of 8.

Well this could be easily done by netCDF-4 using attributes to store
the info needed.

It would still be a valid HDF5 file, but readers would be mighty
confused about how to read it unless they understood the conventions
we'd use to store the scale/offset numbers for a dataset...

However, I don't think we would use the HDF5 bitfield for this.

Ed