Another way of limiting the precision of floating point numbers is to set the lower order bits of the mantissa to zero, called **bit-shaving** for some reason. Its quite simple to do, we just convert the float to the identical bits in an integer, then AND it with a mask that sets the lower bits to 0:

static private int allOnes = 0xffffffff; static public int getBitMask(int bitN) { if (bitN >= 23) return allOnes; return allOnes << 23-bitN; // rightmost bits get set to zero } /** * Shave n bits off the float * @param value original floating point * @param bitMask bitMask from getBitMask() * @return modified float */ static public float bitShave(float value, int bitMask) { if (Float.isNaN(value)) return value; int bits = Float.floatToRawIntBits(value); int shave = bits & bitMask; if (debug) System.out.printf("0x%s -> 0x%s : ", Integer.toBinaryString(bits),

Integer.toBinaryString(shave)); float result = Float.intBitsToFloat(shave); if (debug) System.out.printf("%f -> %f %n", value, result); return result; }

If you want N bits of precision, the mask has the lower (23 - N) bits set to zero. The reletive precision of the shaved data will be 1 / 2^N. Try it out:

public static void checkShavedPrecision(float[] org, int bits) { double expectedPrecision = Math.pow(2.0, -bits); float max_pdiff = - Float.MAX_VALUE; int bitMask = Grib2NetcdfWriter.getBitMask(bits); for (int i=0; i< org.length; i++) { float d = org[i]; float shaved = Grib2NetcdfWriter.bitShave(org[i], bitMask); float diff = Math.abs(d - shaved); float pdiff = (d != 0.0) ? diff / d : 0.0f; assert pdiff < expectedPrecision; max_pdiff = Math.max(max_pdiff, pdiff); } if (max_pdiff != 0.0) { // usually max precision lies between 1/2^N and 1/2^(N+1)

if (max_pdiff < expectedPrecision / 2 || max_pdiff > expectedPrecision)

System.out.printf("nbits=%d max_pdiff=%g expect=%g%n", bits, max_pdiff,

expectedPrecision);

} }

Compression with scale/offsets allows you to set the absolute precision of floating point data to an arbitrary value, and bit-shaving allows you to set relative precision to a factor of 2:

Math.abs((UF - F)/F) < 1/2^N where: F = original number UF = shaved value N = number of bits

Hi John,

Nice post. A few years ago, as part of a short study looking into a variety of netcdf data compression options, I wrote a handful of equivalent bit-shaving functions in python.

I've posted a couple of variants for bit-shaving 32-bit floats on my gist site - see https://gist.github.com/rockdoc/ecac8b1c5f3f99079aac. The 64-bit versions are pretty similar (I can provide them if anyone's interested.)

One question: When you speak of the bit-shaving method giving a relative precision of 1 / 2^N, what is this relative to?

Regards,

Phil

Posted by

Phil Bentleyon September 08, 2014 at 03:23 AM MDT #Hi Phil:

Sorry I didnt make that clear. Its relative to the original floating point number:

Math.abs((UF - F)/F) < 1/2^N

where:

F = original number

UF = shaved value

Ill add that to the text. Thanks for your comment.

John

Posted by

John Caronon September 08, 2014 at 12:17 PM MDT #I downloaded significant wave height data from 'ftp://polar.ncep.noaa.gov/pub/history/waves/",which I need to convert GRIB2 files in Matlab environment,to a format such as Excel, but I do not know how to go about it. Since I am not good at using Matlab. Can someone please help out with what to do? Bearing in mind that I am not good at using Matlab. You can help out by composing what I need to write to Matlab to retrieve the data. Thanks!

Posted by

Philon September 11, 2014 at 06:51 PM MDT #