We decided on the basis of tests in Java that bzip2 is a good candidate compression alternative to deflate, which is currently the only standard compression option in netCDF-4. My colleague Dennis Heimbinger created a branch of the netCDF C library that uses the 1.06 bzip2 library from bzip2.org, and my colleague Ward Fisher built it for Windows for me to test. Ward had to build it from source, since no prebuilt Windows version exists.
With that substantial help, I was able to copy the NCEP sample model GRIB files to netCDF-4 files with bzip2 compression. To make sure I was getting maximum compression, I used level=9, which uses a 900K block size. However, tests show that this can be lower without increasing the file size. A few of the files had to be excluded because they were not being completely read by the netCDF-Java reader completely, and so the file ratios were misleading.
Ok, here are the results:
As you see, the range of compression ratios goes from ~ .4 to 1.8. The average is 1.12; for GRIB-1 its .92, and for GRIB-2 its 1.20. These are on the plain ole float arrays as read from the GRIB files. The four lowest values among the GRIB-2 files are simple bit packed, not JPEG-2000 compressed.
There a chance that the bzip2.org C library may be slightly (2-5% ?) less efficient than the 7zip and tadaki Java bzip2 libraries. So it needs to be investigated if another bzip2 C implementation might do better.
As previously blogged, Java prototyping indicates that there may be another 7-10% to be gained by doing floating point bit shaving or conversion to integer arrays using scale/offset. We are considering adding these to the netCDF-4 library as "lossy compression" options.
Meanwhile we can say with some confidence that bzip2 compression can get us to within 20% of GRIB compression, on average for NCEP model GRIB output, your mileage may vary, offer void where taxed or prohibited. A good enough result for now. Thanks again to Dennis and Ward who jumped up to help in time to present these results at the ECMWF Workshop on Closing the GRIB/NetCDF gap next week.
Well done mates, that was massive indeed. Now stay tuned to the BBC for more cricket scores.