Compression by Bit-Shaving
Another way of limiting the precision of floating point numbers is to set the lower order bits of the mantissa to zero, called bit-shaving for some reason. Its quite simple to do, we just convert the float to the identical bits in an integer, then AND it with a mask that sets the lower bits to 0:
static private int allOnes = 0xffffffff; static public int getBitMask(int bitN) { if (bitN >= 23) return allOnes; return allOnes << 23-bitN; // rightmost bits get set to zero } /** * Shave n bits off the float * @param value original floating point * @param bitMask bitMask from getBitMask() * @return modified float */ static public float bitShave(float value, int bitMask) { if (Float.isNaN(value)) return value; int bits = Float.floatToRawIntBits(value); int shave = bits & bitMask; if (debug) System.out.printf("0x%s -> 0x%s : ", Integer.toBinaryString(bits), Integer.toBinaryString(shave)); float result = Float.intBitsToFloat(shave); if (debug) System.out.printf("%f -> %f %n", value, result); return result; }
If you want N bits of precision, the mask has the lower (23 - N) bits set to zero. The reletive precision of the shaved data will be 1 / 2^N. Try it out:
public static void checkShavedPrecision(float[] org, int bits) { double expectedPrecision = Math.pow(2.0, -bits); float max_pdiff = - Float.MAX_VALUE; int bitMask = Grib2NetcdfWriter.getBitMask(bits); for (int i=0; i< org.length; i++) { float d = org[i]; float shaved = Grib2NetcdfWriter.bitShave(org[i], bitMask); float diff = Math.abs(d - shaved); float pdiff = (d != 0.0) ? diff / d : 0.0f; assert pdiff < expectedPrecision; max_pdiff = Math.max(max_pdiff, pdiff); } if (max_pdiff != 0.0) { // usually max precision lies between 1/2^N and 1/2^(N+1) if (max_pdiff < expectedPrecision / 2 || max_pdiff > expectedPrecision) System.out.printf("nbits=%d max_pdiff=%g expect=%g%n", bits, max_pdiff, expectedPrecision); } }
Compression with scale/offsets allows you to set the absolute precision of floating point data to an arbitrary value, and bit-shaving allows you to set relative precision to a factor of 2:
Math.abs((UF - F)/F) < 1/2^N where: F = original number UF = shaved value N = number of bits
Posted by: caron
Sep 5, 2014
Article Category
Article type
Developer Blog
Add new comment