Compression by Bit-Shaving

Another way of limiting the precision of floating point numbers is to set the lower order bits of the mantissa to zero, called bit-shaving for some reason. Its quite simple to do, we just convert the float to the identical bits in an integer, then AND it with a mask that sets the lower bits to 0:

static private int allOnes = 0xffffffff; 

static public int getBitMask(int bitN) { 
    if (bitN >= 23) return allOnes; 
    return allOnes << 23-bitN; // rightmost bits get set to zero 
} 

/** 
 * Shave n bits off the float 
 * @param value original floating point 
 * @param bitMask bitMask from getBitMask() 
 * @return modified float 
 */ 
static public float bitShave(float value, int bitMask) { 
    if (Float.isNaN(value)) return value; 
    int bits = Float.floatToRawIntBits(value); 
    int shave = bits & bitMask; 
    if (debug) System.out.printf("0x%s -> 0x%s : ", Integer.toBinaryString(bits),   
                                  Integer.toBinaryString(shave)); 
    float result = Float.intBitsToFloat(shave); 
    if (debug) System.out.printf("%f -> %f %n", value, result); 
    return result; 
}

If you want N bits of precision, the mask has the lower (23 - N) bits set to zero. The reletive precision of the shaved data will be 1 / 2^N. Try it out:

public static void checkShavedPrecision(float[] org, int bits) { 
    double expectedPrecision = Math.pow(2.0, -bits); 
    float max_pdiff = - Float.MAX_VALUE; 
    int bitMask = Grib2NetcdfWriter.getBitMask(bits); 

    for (int i=0; i< org.length; i++) { 
        float d = org[i]; 
        float shaved = Grib2NetcdfWriter.bitShave(org[i], bitMask); 
        float diff = Math.abs(d - shaved);
        float pdiff = (d != 0.0) ? diff / d : 0.0f; 
        assert pdiff < expectedPrecision; 
        max_pdiff = Math.max(max_pdiff, pdiff); 
    } 

    if (max_pdiff != 0.0) { // usually max precision lies between 1/2^N and 1/2^(N+1)
        if (max_pdiff < expectedPrecision / 2 || max_pdiff > expectedPrecision)
           System.out.printf("nbits=%d max_pdiff=%g expect=%g%n", bits, max_pdiff,
                          expectedPrecision);
        } 
     } 

 Compression with scale/offsets allows you to set the absolute precision of floating point data to an arbitrary value, and bit-shaving allows you to set relative precision to a factor of 2:

Math.abs((UF - F)/F) < 1/2^N where: F = original number UF = shaved value N = number of bits 
Posted by: caron
Sep 5, 2014

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and email addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Article Category
Article type
Developer Blog