« Part 3: Converting... | Main | CF geostationary... »

25 August 2014

In the last post, we saw that simple deflate compression works reasonably well on GRIB-1 files that have limited precision (expressed as the number of bits in the bit-packing algorithm). After expanding to single precision floats, deflate will compress this data to be .40 times the floating point size (2.5 times smaller), and 1.28 times larger than the original GRIB record, on average for the NCEP datasets we are using. If you run deflate directly on the bit-packed data, deflate will make the files .91 as large as the GRIB records (ie make them 9% smaller). In comparing with GRIB sizes, there doesnt seem to be any dependence on the number of bits used, but deflate seems to do better on large records (40K points or more).

Now we turn our attention to GRIB-2.

We have 35 NCEP model runs in GRIB-2 in our sample. GRIB-2 has several packing schemes, indicated by the "DRS template":

D = has duplicates    DRS template       0: count = 30       2: count = 283       3: count = 15017      40: count = 384980        DGEX_Alaska_12km_20100524_0000.grib2      DGEX_CONUS_12km_20100514_1800.grib2      GEFS_Global_1p0deg_Ensemble_20120215_0000.grib2      GEFS_Global_1p0deg_Ensemble_derived_20120214_0000.grib2      GFS_Global_0p5deg_20100913_0000.grib2      GFS_Global_0p5deg_20140804_0000.grib2  3   GFS_Global_2p5deg_20100602_1200.grib2      GFS_Global_onedeg_20100913_0000.grib2      GFS_Puerto_Rico_0p5deg_20140106_1800.grib2  3   HRRR_CONUS_3km_wrfprs_201408120000.grib2      NAM_Alaska_11km_20100519_0000.grib2      NAM_Alaska_45km_conduit_20100913_0000.grib2      NAM_CONUS_12km_20100915_1200.grib2      NAM_CONUS_12km_20140804_0000.grib2      NAM_CONUS_12km_conduit_20140804_0000.grib2      NAM_CONUS_20km_selectsurface_20100913_0000.grib2      NAM_CONUS_20km_surface_20100913_0000.grib2      NAM_CONUS_40km_conduit_20100913_0000.grib2      NAM_Firewxnest_20140804_0000.grib2      NAM_Polar_90km_20100913_0000.grib2  2   NDFD_CONUS_5km_20140805_1200.grib2  2/3 NDFD_CONUS_5km_conduit_20140804_0000.grib2  2   NDFD_Fireweather_CONUS_20140804_1800.grib2      RR_CONUS_13km_20121028_0000.grib2      RR_CONUS_20km_20140804_1900.grib2      RR_CONUS_40km_20140805_1600.grib2  0/2 RTMA_CONUS_2p5km_20111221_0800.grib2  0   RTMA_GUAM_2p5km_20140803_0600.grib2      RUC2_CONUS_20km_hybrid_20100913_0000.grib2      RUC2_CONUS_20km_pressure_20100509_1300.grib2      RUC2_CONUS_20km_surface_20100516_1600.grib2      SREF_Alaska_45km_ensprod_20120213_1500.grib2      SREF_CONUS_40km_ensprod_20120214_1500.grib2      SREF_CONUS_40km_ensprod_20140804_1500.grib2      SREF_CONUS_40km_ensprod_biasc_20120213_2100.grib2  D   SREF_CONUS_40km_ensprod_biasc_20140805_1500.grib2      SREF_CONUS_40km_pgrb_biasc_nmm_n2_20100602_1500.grib2      SREF_CONUS_40km_pgrb_biasc_rsm_p2_20120213_1500.grib2      SREF_PacificNE_0p4_ensprod_20120213_2100.grib2  D   WW3_Coastal_Alaska_20140804_0000.grib2  D   WW3_Coastal_US_East_Coast_20140804_0000.grib2  D   WW3_Coastal_US_West_Coast_20140804_1800.grib2  D   WW3_Global_20140804_0000.grib2  D   WW3_Regional_Alaska_20140804_0000.grib2  D   WW3_Regional_Eastern_Pacific_20140804_0000.grib2  D   WW3_Regional_US_East_Coast_20140804_0000.grib2  D   WW3_Regional_US_West_Coast_20140803_0600.grib2

In our sample, by far most of the records use template 40, which is JPEG-2000 wavelet compression. Template 0 is the same bit-packing scheme that GRIB-1 uses; templates 2 and 3 are "complex packing schemes". JPEG-2000 compression is known to give the best compression, and so we will restrict our attention in this post to the records using JPEG-2000 compression.

Over the entire sample of 385K records, deflate compression will make files about .257 times the original, single precision floating point size. However, compared to GRIB2 JPEG-2000 compression, the files will be 2.10 times the GRIB files. Here is a breakdown of the ratio of deflate size over the original single precision floats, for all files as a function of number of points in the compressed record:

And the ratio of deflate over grib size, as a function of the number of points:

GRIB-2 records are substantially larger than GRIB-1. In Part 3, most of the GRIB-1 record sizes were less than 20K. Here, most are greater than 20K, and there does not seem to be a dependence of compression ratio on the number of points in the record.

Now for a few representative datasets, the compression ratio as a function of the bits of precision:

As with GRIB1, there does not seem to be any dependence of the ratio of deflate/grib size on the number of bits of precision.

Running deflate on the original JPEG-2000 compressed message does not yield extra compression. In contrast we saw that deflate could squeeze an extra 9% on average out of GRIB-1 records that used simple bit-packing.

So, the current score is that deflate level 3 compression will create netCDF-4 files that are on average 2.10 times bigger than GRIB2.

Next time we will look at other possible compression algorithms on GRIB-2 data.

Posted by $entry.creator.screenName [ Comments [1] ]

Comments:

I have updated the plot "file size ratio deflate / grib2". I was calculating GRIB size incorrectly; new results are that on average, the ratio of deflate with GRIB2 wavelet compression is 2.10, not 2.32.

Posted by John Caron on September 13, 2014 at 09:43 AM MDT #