GRIB bitmaps are not such a good idea

Untitled Document

Both GRIB1 and GRIB2 have an optional bitmap section which is a bit array for marking missing data. There is a single bit for each data value, so a data array of N points requires N/8 bytes for the bitmap array. A bit value of 0 indicates that the data is missing at that point, so that data value doesn't have to be stored.

Seems like that has to be a good idea, and the more missing values there are, the better. Right?

Turns out that combined with compression, its not such a good idea, and the more missing values there are, the worse it is. The problem is that the bitmap section is not compressed, only the data section is.

Heres an extreme case, NOAA wave watch data off the west coast and including Hawaii. The grid size is 526 x 736, giving 387,136 points, most of which are missing:

Examining a random record in ToolsUI IOSP/GRIB2/GRIB2data, we see that there are only 11,127 non-missing data values. These are compressed nicely into 7702 bytes by GRIB wavelet compression. The problem is that we need 387136/8 = 48392 bytes to store the bitmap, which is not compressed. That makes the entire GRIB message 56270 bytes.

I didn't really think about it until I noticed with my compression tests that other compression schemes are beating GRIB by factors greater than 20 on some records, which is surprising. Using bzip2, for that example record, the entire array including missing values is compressed to a size of 9576 bytes. Deflate (zip) compression compresses it into 25786 bytes. LZMA (7zip) gets it down to 8994, which is 6 times smaller.

Overall, on that entire file, the estimated file sizes for various compression schemes are:

size (MB)
GRIB 45.72
deflate 28.29
bzip2 12.28
7zip 11.47

Standard compression algorithms are very, very good at compressing repeated bytes of data. The more missing data, the better they do. On that file, every record was at least a factor or 2 smaller for bzip2 and 7zip compression, than the GRIB record size.

I didn't test this, but its likely the JPEG-2000 wavelet compression would do much better at compressing the data with missing values in it, compared to the current technique of removing the missing values with an uncompressed bitmap.

(PS: I just noticed that bitmaps can be shared between records, by using the 'repeating section' feature of GRIB2. This will mitigate the above conclusion by an unknown amount).

Here's a gratuitous shot of the ToolsUI IOSP/GRIB2/GRIB2data tab on that example file (click on image for more detail):

If you right click on one of the records, and choose "Compute Scale/offset of data" from the context menu, you can see some alternate compression sizes for that record. For example the example record shows:

 nbits = 10
 npoints = 387136
 width = 1022 (0x3fe) 
 scale = 0.0100000 
 resolution = 0.00500000 
 range = 10.230000 

           actual    computed
 dataMin = 1.390000 1.390000
 dataMax = 11.250000 11.620000
 actual range = 9.860000
 scale_factor = 0.00964775
 add_offset = 1.39000

 max_diff = 0.00481459
 avg_diff = 7.18643e-05
 std_diff = 0.000479742

Compression
 number of values = 387136
 uncompressed as floats = 1548544
 uncompressed packed bits = 483920
 grib data length = 7702
 grib msg length = 56270

deflate (float)
 compressedSize = 19374
 ratio floats / size = 79.928978
 ratio packed bits / size = 24.977806
 ratio size / grib = 0.344304

deflate (scaled ints)
 compressedSize = 16339
 ratio floats / size = 94.775932
 ratio packed bits / size = 29.617479
 ratio size / grib = 0.290368

bzip2 (floats)
 compressedSize = 9771
 ratio floats / size = 158.483673
 ratio packed bits / size = 49.526150
 ratio size / grib = 0.173645

bzip2 (scaled ints)
 compressedSize = 9455
 ratio floats / size = 163.780426
 ratio packed bits / size = 51.181385
 ratio size / grib = 0.168029

 

Lots of other info is available from that context menu. Welcome to the inner workings of GRIB sausage making.

Comments:

Change
"A bit value of 1 indicates that the data is missing at that point"

to

"A bit value of 0 indicates that the data is missing at that point"

Posted by John Caron on September 19, 2014 at 01:01 PM MDT #

you can use api for GRIB bitmap, by example https://www.good-fundraising-ideas.com or https://software.ecmwf.int/wiki/display/GRIB/GRIB+API+examples</p></p>

Bye

Posted by loy on September 01, 2016 at 09:55 AM MDT #

Post a Comment:
Comments are closed for this entry.
Unidata Developer's Blog
A weblog about software development by Unidata developers*
Unidata Developer's Blog
A weblog about software development by Unidata developers*

Welcome

FAQs

News@Unidata blog

Take a poll!

What if we had an ongoing user poll in here?

Browse By Topic
Browse by Topic
« April 2019
SunMonTueWedThuFriSat
 
2
3
4
5
6
7
9
10
11
12
13
14
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today