[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: packing plans



Dick,

> I just received Fulker's e-mail FWD to dataws, regarding your plans
> on packing.
> 
> I am interested because I maintain two CRAY-specific pack/unpack 
> routine sets, and users are beginning to ask more frequently for a 
> portable version.  I think the interest is growing in proportion to
> the CCM2's usage in non-NCAR and non-CRAY environments.  And if CSL 
> adopts netCDF as its format there will even more requests.
> 
> In this context, you should know that I receive frequent user request
> for packing with "zero preservation".  I.e.  if an array contains zero 
> at a given location and you pack the array, then when you unpack the 
> array it must have zero in the same location.
> 
> Does your packing scheme have this property?  Mine don't.

No, mine don't either.  I've appended the spec for how the packing is to be
done (except that if practical we will permit the _Offset and _Scale packing
parameters to be vectors that vary along one of the variable dimensions).
Also we won't use netCDF attributes to store the _Nbits, _Offset, and _Scale
for a parameter, but will instead store these with the variable information
in the header.

We do preserve a special missing value in our packing scheme, so if "0" is
used for this purpose, then it is preserved.

--Russ

                   NetCDF Packing Interface Specification
                   (unresolved issues are in brackets [])

All netCDF types permit packed representations.  A netCDF variable will be a
packed variable if the `_Nbits' variable attribute is defined when the
variable is first defined.  In this case, the variable attributes `_Offset'
and `_Scale' may also be defined, but if not there are default values for
these attributes.

_Nbits 

        The _Nbits attribute must be defined for a variable before any
        values have been written for that variable (including _FillValues)
        and must not be redefined with a different value after any values
        (including _FillValues) have been written.

        0 <= _Nbits <= 32.  If Nbits is 0, no data needs to be stored, and
        this variable is only a handle for attributes.  In this case, the
        variables value on a read is the _FillValue.  It is not possible
        to store more than 32 bits of precision, even for double values,
        [because of the restrictions of XDR?].  Providing a value of _Nbits
        greater than 16 for a NC_SHORT variable or greater than 8 for an
        NC_CHAR or NC_BYTE variable is not useful.

        One value of the packed range will be used for the representation of
        the packed _FillValue, so the packed values will represent 2^_Nbits
        - 1 distinct data values.  

        Once defined for a variable, the value of the _Nbits attribute
        cannot be changed.

_Offset

        The _Offset attribute should be of the same type as the variable.  A
        useful value of _Offset is the minimum valid data value, so that all
        packed data will be non-negative.  

        If _nbits is specified and _Offset is not specified, _Offset
        defaults to 0 of the same type as the variable.

        [Once defined for a variable, the value of the _Offset attribute
        cannot be changed?]

_Scale

        The _Scale attribute should be of type double [or float?].  A
        useful value of _Scale in the case that data values map to the
        integers 0, 1, ..., 2^_Nbits-2 and the missing value maps to
        2^_Nbits-1 is:

                (Max - Min)/(2^_Nbits - 2)

        assuming the packing formulas are:

                packed = truncate_to_Nbits( (value - _Offset) / _Scale )

                value = packed * _Scale + _Offset

        If instead the missing value maps to 0 and the data maps to the
        integer range 1, 2, ..., 2^Nbits -1, then _Scale should be

                (Max - Min)/(2^_Nbits - 1)

        and the packing formulas should be

                packed = truncate_to_Nbits( (value - _Offset) / _Scale + 1 )

                value = (packed - 1) * _Scale + _Offset

        [We have to choose and document which packing formula we will use].

        If _Nbits is specified and _Scale is unspecified, _Scale defaults to
        NC_FLOAT 1.0. [does this work OK with the packing formulas above?]

        It may be advantageous for users to pick _Scale to be a positive or
        negative power of two to ensure that the multiplications and
        divisions that occur during packing and unpacking are exact.  This
        can be done by properly adjusting Max, Min, and _Nbits.  [Once
        defined for a variable, the value of the _Scale attribute cannot be
        changed?]

_FillValue

        The _FillValue will be mapped into the packed range to 2^_Nbits-1
        [or 0, if the other packing formula is used].  The _FillValue (and
        valid_range, valid_min, or valid_max) attributes should always be
        specified in terms of the unpacked values of a variable.