Showing entries tagged [cache]

Large-Enough Cache Very Important When Reading Compressed NetCDF-4/HDF5 Data

The HDF5 chunk cache must be large enough to hold an uncompressed chunk.

Here's some test runs showing that a large enough cache is very important when reading compressed data. If the chunk cache is not big enough, then the data have to be deflated again and again.

The first run below uses the default 1MB chunk cache. The second uses a 16 MB cache. Note that the times to read the first time step are comparable, but the run with the large cache has a much lower average time, because each chunk is only uncompressed one time.

bash-3.2$ sudo ./clear_cache.sh && ./tst_ar4 pr_A1_z1_64_128_256.nc -h
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_hor(us)   avg_read_hor(us)
64    128   256   1.0       1       0       387147             211280

bash-3.2$ sudo ./clear_cache.sh && ./tst_ar4 pr_A1_z1_64_128_256.nc -h \
bash-3.2$ -c 16000000 pr_A1_z1_64_128_256.nc
s[0] cs[1] cs[2] cache(MB)  deflate shuffle 1st_read_hor(us)   avg_read_hor(us)
64   128   256   15.3       1       0       320176             4558

For comparison, here's the time for the netCDF-4/HDF5 file which is not compressed:

bash-3.2$ sudo ./clear_cache.sh && ./tst_ar4 -h pr_A1_64_128_256.nc
cs[0] cs[1] cs[2] cache(MB)  deflate shuffle 1st_read_hor(us)  avg_read_hor(us)
64    128   256   1.0        0       0       459               1466

And here's the same run on the classic netCDF version of the file:

bash-3.2$ sudo ./clear_cache.sh && ./tst_ar4 -h \
bash-3.2$ pr_A1.20C3M_8.CCSM.atmm.1870-01_cat_1999-12.nc
cs[0] cs[1] cs[2] cache(MB)  deflate shuffle 1st_read_hor(us)  avg_read_hor(us)
0     0     0     0.0        0       0       2172              1538

So the winner is NetCDF-4/HDF5 for performance, with the best read time for the first time step, and the best average read time. Next comes the netCDF classic file, then the netCDF-4/HDF5 compressed file, which takes two order of magnitude longer than the classic file for the first time step, but then catches up so that the average read time is only 4 time slower than the classic file.

The file sizes show that this read penalty is probably not worth it:

pr_A1.20C3M_8.CCSM.atmm.1870-01_cat_1999-12.nc    204523236
pr_A1_z1_64_128_256.nc                          185543248
pr_A1_64_128_256.nc                               209926962

So the compressed NetCDF-4/HDF5 file saves only 20 MB out of about 200, about 10%.

The uncompressed NetCDF-4/HDF5 file is 5 MB larger than the classic file, or about 2.5% larger. 

Smaller Chunk Sizes For Unlimited Dimension

More tests...

r_A1_4_64_128.nc pr_A1_8_64_128.nc pr_A1_16_64_128.nc pr_A1_32_64_128.nc \
pr_A1_64_64_128.nc
cs[0] cs[1] cs[2]  cache(MB) deflate shuffle  1st_read_hor(us) avg_read_hor(us)
0    0    0     0.0       0    0        2155        1603
4    64    128    1.0       0    0        7021        1567
8    64    128    1.0       0    0        14084        1538
16    64    128    1.0       0    0        82906        1570
32    64    128    1.0       0    0        145295        2138
64    64    128    1.0       0    0        21960        2825
cs[0] cs[1] cs[2]  cache(MB) deflate shuffle  1st_read_ser(us) avg_read_ser(us)
0    0    0    0.0       0    0        2399157        9181
4    64    128    1.0       0    0        2434194        15954
8    64    128    1.0       0    0        2317802        13627
16    64    128    1.0       0    0        1531121        12686
32    64    128    1.0       0    0        1299189        12265
64    64    128    1.0       0    0        863365        2356 

File Size and Chunking in NetCDF-4 on AR-4 Data File

Trying to pick chunksizes can be hard!

chunk sizes     Size Difference (bytes)
1_128_128     0.33
1_128_256     0.25
1_128_32     0.86
1_16_128      1.56
1_16_256     0.86
1_16_32      5.75
1_64_128     0.51
1_64_256      0.33
1_64_32      1.56
10_128_128      0.18
10_128_256     0.17
10_128_32     0.23
10_16_128      0.3
10_16_256     0.23
10_16_32      0.72
10_64_128      0.2
10_64_256     0.18
10_64_32      0.3
1024_128_128    64.12
1024_128_256    64.12
1024_128_32     64.12
1024_16_128     64.12
1024_16_256     64.12
1024_16_32     64.13
1024_64_128     64.12
1024_64_256     64.12
1024_64_32     64.12
1560_128_128    0.16
1560_128_256    0.16
1560_128_32     0.16
1560_16_128     0.16
1560_16_256     0.16
1560_16_32     0.16
1560_64_128     0.16
1560_64_256     0.16
1560_64_32     0.16
256_128_128     30.57
256_128_256     30.57
256_128_32     30.57
256_16_128     30.58
256_16_256     30.57
256_16_32      30.59
256_64_128      30.57
256_64_256     30.57
256_64_32     30.58
classic     0

NetCDF-4 AR-4 Timeseries Reads and Cache Sizes

Faster time series for the people!

What HDF5 chunk cache sizes are good for reading timeseries data in netCDF-4? I'm sure you have wondered - I know I have. Now we know: .5 to 4 MB. Bigger caches just slow this down. Now that came as a surprise!

The first three numbers are the chunk sizes of the 3 dimensions of the main data variable. The next two columns show the deflate (0 = none) and shuffle filter (0 = none). These are all the same for every run, because the same input file is used for all these runs - only the chunk cache size is changed when (re-)opening the file. The Unix file cache is cleared between each run.

The two times shows are the number of micro-seconds to read a time-series of the data, and the average time to read a time series after all time series are read.

*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches...
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_ser(us) avg_read_ser(us)
256   128   128   0.5       0       0       1279615          2589
256   128   128   1.0       0       0       1279613          2641
256   128   128   4.0       0       0       1298543          2789
256   128   128   16.0      0       0       1470297          34603
256   128   128   32.0      0       0       1470360          34541

Note that for cache sizes of < 4 MB, the first time series read took 1.2 - 1.3 s, and the average time was .0025 - .0028 s. But when I increased the chunk cache to 16 MB and 32MB, the time for the first read went to 1.5 s, and the avg time for all reads went to .035 s - an order of magnitude jump!

I have repeated these tests a number of times, always with this result for chunk cache buffers 16 MB and above.

I am planning on changing the netCDF-4.1 default to 1 MB, which is the HDF5 default. (I guess we should have listened to the HDF5 team in the first place.)

What Cache Size Should be Used to Read AR-4/AR-5 3D Data?

A question that has puzzled the greatest minds of history...

The not-yet-checked-in script nc_test4/run_bm_cache.sh tests reading a sample 3D data file with different sized caches.

Because of a weird increase in time for horizontal reads for 16MB cache size, I re-ran the test twice more to make sure I got the same results. And I did. No explanation why 16 MB works so poorly.

The current netCDF-4 default cache size is 4MB (which does fine), but I note that the original HDF5 default of 1 MB does even better. Perhaps I should just leave this cache alone as a default choice, and give users the HDF5 settings...

bash-3.2$ ./run_bm_cache.sh
*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches... cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_time_ser(us) 256   128   128   0.5       0       0       1291104 256   128   128   1.0       0       0      1298621 256   128   128   4.0       0       0       1306983 256   128   128   16.0      0       0       1472738 256   128   128   32.0      0       0       1497533
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) 256   128   128   0.5       0       0       2308 256   128   128   1.0       0       0       2291 256   128   128   4.0       0       0       2453 256   128   128   16.0      0       0       11609
256   128   128   32.0      0       0       2603

SUCCESS!!!

bash-3.2$ ./run_bm_cache.sh 
*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches...
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_time_ser(us)
256   128   128   0.5       0       0       1290340
256   128   128   1.0       0       0       1281898
256   128   128   4.0       0       0       1306885
256   128   128   16.0      0       0       1470175
256   128   128   32.0      0       0       1497529
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us)
256   128   128   0.5       0       0       2298
256   128   128   1.0       0       0       2292
256   128   128   4.0       0       0       2335
256   128   128   16.0      0       0       11572
256   128   128   32.0      0       0       1841

SUCCESS!!!

bash-3.2$ ./run_bm_cache.sh 
*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches...
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_time_ser(us)
256   128   128   0.5       0       0       1298650
256   128   128   1.0       0       0       1298636
256   128   128   4.0       0       0       1565326
256   128   128   16.0      0       0       1497482
256   128   128   32.0      0       0       1497529

cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us)
256   128   128   0.5       0       0       2303
256   128   128   1.0       0       0       2287
256   128   128   4.0       0       0       2280
256   128   128   16.0      0       0       11584
256   128   128   32.0      0       0       1830

SUCCESS!!!

Unidata Developer's Blog
A weblog about software development by Unidata developers*
Unidata Developer's Blog
A weblog about software development by Unidata developers*

Welcome

FAQs

News@Unidata blog

Recent Entries:
Take a poll!

What if we had an ongoing user poll in here?

Browse By Topic
Browse by Topic
« November 2017
SunMonTueWedThuFriSat
   
1
2
3
4
5
7
8
9
10
11
12
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  
       
Today