Re: HDF5 bitfields...

Hi Russ,

> John Caron wrote:
> > the scale/offset can be calculated easily from the data itself. often, 
> > people want to apply different scale/offset to different parts of the 
> > same array, eg vertical levels.
> 
> and you replied:
> >     Hmm, how would you parameterize this?  Would a user select various parts
> > of the dataset's dataspace and specify scale/offset information for them?
> 
> When Harvey Davies was here from Australia for a visit about 8 years
> ago, we worked out two kinds of scaling for varying packing parameters
> along one or more dimensions of a variable: predefined scaling and
> adaptive scaling.  
> 
> With predefined scaling, the scale and offset values associated with a
> packed variable were stored in auxiliary arrays, varying along just
> the subset of dimensions used by these arrays.  For example, to store
> a packed array of temperatures, one might use
> 
>   dimensions:
>     time = ...
>     lat = ...
>     lon = ...
>     level = ...
>   variables:
>     byte temperature(time, level, lon, lat);
>     double temperature_scale_factor(level);
>     double temperature_add_offset(level);
> 
> which would use a possibly different (scale_factor, add_offset) pair
> for packing temperatures on each atmospheric level.  This would allow
> for greater precision using the same number of bits (or fewer bits for
> the same precision) than using one packing parameter pair for all the
> data, because this variable tends to have values that depend on level.
> It wouldn't work so well with other variables that don't have a
> level-dependence.
> 
> With adaptive scaling, the optimum scale and offset values were to be
> computed by the library for each slab of the variable as it was
> written, and stored in automatically-generated associated variables
> (or multidimensional attributes).
> 
> Although we defined interfaces for these types of scaling, they were
> never implemented.  Implementing adaptive scaling seemed pretty
> ambitious, and even the predefined scaling would have required
> adoption of new conventions for naming associated variables, etc.  And
> the proposals actually foundered on inability to agree on all the gory
> details, such as determining whether to permit the types of the
> scaling parameters to be user-specifiable in adaptive scaling, etc.
    Ok, I see.  Hmm...  I think that the adaptive scaling would actually be
somewhat easier that the predefined scaling you describe in HDF5.  With the
adaptive scaling, each chunk in the dataset could be scanned to compute the
optimum scale and offset values which would be stored with the chunk.  Handling
predefined scaling that varied according to a position within the dataspace
seems like it would require accessing some information that was stored outside
each chunk and that might be a little unusual in the current implementation.
Predefined scaling that didn't vary across the dataspace would be easier than
either of those methods, of course.  Although it gets a little weird to define
any sort of scaling on non-numeric datatypes, we've got a mechanism for
disallowing that now.

    Quincey
>From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 20 2004 Jul -0600 07:14:05 
Message-ID: <wrxvfgi95ya.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 20 Jul 2004 07:14:05 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
In-Reply-To: <200407200241.i6K2fgLj039469@xxxxxxxxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: Bit Packing (was: Re: HDF5 bitfields...)
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id i6KDE62x014930
        for netcdf-hdf-out; Tue, 20 Jul 2004 07:14:06 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id i6KDE5aW014926
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Tue, 20 Jul 2004 07:14:06 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200407201314.i6KDE5aW014926
References: <200407200241.i6K2fgLj039469@xxxxxxxxxxxxxxxxxxxxxx>
Lines: 91
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk
Reply-To: netcdf-hdf@xxxxxxxxxxxxxxxx

Quincey Koziol <koziol@xxxxxxxxxxxxx> writes:

> Hi Russ,
> 
> > John Caron wrote:
> > > the scale/offset can be calculated easily from the data itself. often, 
> > > people want to apply different scale/offset to different parts of the 
> > > same array, eg vertical levels.
> > 
> > and you replied:
> > >     Hmm, how would you parameterize this?  Would a user select various 
> > > parts
> > > of the dataset's dataspace and specify scale/offset information for them?
> > 
> > When Harvey Davies was here from Australia for a visit about 8 years
> > ago, we worked out two kinds of scaling for varying packing parameters
> > along one or more dimensions of a variable: predefined scaling and
> > adaptive scaling.  
> > 
> > With predefined scaling, the scale and offset values associated with a
> > packed variable were stored in auxiliary arrays, varying along just
> > the subset of dimensions used by these arrays.  For example, to store
> > a packed array of temperatures, one might use
> > 
> >   dimensions:
> >     time = ...
> >     lat = ...
> >     lon = ...
> >     level = ...
> >   variables:
> >     byte temperature(time, level, lon, lat);
> >     double temperature_scale_factor(level);
> >     double temperature_add_offset(level);
> > 
> > which would use a possibly different (scale_factor, add_offset) pair
> > for packing temperatures on each atmospheric level.  This would allow
> > for greater precision using the same number of bits (or fewer bits for
> > the same precision) than using one packing parameter pair for all the
> > data, because this variable tends to have values that depend on level.
> > It wouldn't work so well with other variables that don't have a
> > level-dependence.
> > 
> > With adaptive scaling, the optimum scale and offset values were to be
> > computed by the library for each slab of the variable as it was
> > written, and stored in automatically-generated associated variables
> > (or multidimensional attributes).
> > 
> > Although we defined interfaces for these types of scaling, they were
> > never implemented.  Implementing adaptive scaling seemed pretty
> > ambitious, and even the predefined scaling would have required
> > adoption of new conventions for naming associated variables, etc.  And
> > the proposals actually foundered on inability to agree on all the gory
> > details, such as determining whether to permit the types of the
> > scaling parameters to be user-specifiable in adaptive scaling, etc.
>     Ok, I see.  Hmm...  I think that the adaptive scaling would actually be
> somewhat easier that the predefined scaling you describe in HDF5.  With the
> adaptive scaling, each chunk in the dataset could be scanned to compute the
> optimum scale and offset values which would be stored with the chunk.  
> Handling
> predefined scaling that varied according to a position within the dataspace
> seems like it would require accessing some information that was stored outside
> each chunk and that might be a little unusual in the current implementation.
> Predefined scaling that didn't vary across the dataspace would be easier than
> either of those methods, of course.  Although it gets a little weird to define
> any sort of scaling on non-numeric datatypes, we've got a mechanism for
> disallowing that now.
> 
>     Quincey

Let me make sure I'm using the same terminology as the rest of you...

When we say "bit packing" we mean applying a scale and offset to a
bunch of, for example 64-bit floats, and ending up with 32-bit floats?
16-bit ints as well?

When we say bitfields we mean storing data that represent actual
bitfields, like the output of an 11-bit A/D converter in some
scientific instrument. In that case we would not wish to apply
scale/offset.

In the bitfield case we *would* like to store 11-bit values without
having to round-up to 16-bits, but that is not what HDF5 bitfields
provide. 

Have I got the terms correct? Because we have both a bitfield and a
bit packing requirement:

 Bit packing

* Data may be bit packed.


Ed