Demonstrating Caching and Its Effect on Timing
02 January 2010
The cache can really mess up benchmarking!
For example:
bash-3.2$ sudo bash clear_cache.sh && ./tst_ar4_3d -h -c
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64 256 128 4.0 0 0 66 2102
bash-3.2$ sudo bash clear_cache.sh && ./tst_ar4_3d -h
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64 256 128 4.0 0 0 1859 2324282
In the first run of tst_ar4_3d, with the -c option, the sample data file
is first created and then read. The read time for the time series read
is really low, because the file (having just been created) is still
loaded in a disk cache somewhere in the OS or in the disk hardware.
When I clear the cache and rerun without the -c option, the sample data
file is not created, it is assumed to already exist. Since the cache has
been cleared, the time series read has to read the data from disk, and
it takes 1000 times longer.
Well, that's why they invented disk caches.
This leads me to believe that my horizontal read times are fake too,
because first I am doing a time series read, those loading some or all
of the file into cache. I need to break that out into a separate test, I
see, or perhaps make the order of the two tests controllable from the
command line.
Oy, this benchmarking stuff is tricky business! I thought I had found
some really good performance for netCDF-4, but now I am not sure. I need
to look again more carefully and make sure that I am not being faked
out by the caches.
Ed
Posted by $entry.creator.screenName
Effects of Clearing the Cache on Benchmarks
02 January 2010
How to win friends and influence benchmarks...
I note that I have a shell in my nc_test4 directory,
clear_cache.sh. I have to sudo to run it, but when I do, it has a
dramatic effect on the time that the time series read takes.
The following uses the new (not yet checked in) test program
tst_ar4_3d.c, which seeks to set up a simpler proxy data file for the
AR-4 tests. I want to show that a simpler file (but with the same-sized
data variable) has similar performance to the slightly more dressed up
pr_A1 file from AR-4 that I got from Gary. That's because my simpler
file is easier to create in a test program.
bash-3.2$ ./tst_ar4_3d -h
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64 256 128 4.0 0 0 1420 2281847
bash-3.2$ ./tst_ar4_3d -h
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64 256 128 4.0 0 0 81 3159
bash-3.2$ ./tst_ar4_3d -h
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64 256 128 4.0 0 0 76 2983
bash-3.2$ sudo bash clear_cache.sh
bash-3.2$ ./tst_ar4_3d -h
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64 256 128 4.0 0 0 1410 2504315
Wow, what a difference a cleared cache makes!
Here's the clear_cache.sh script:
#!/bin/bash -x
# Clear the disk caches.
sync
echo 3 > /proc/sys/vm/drop_caches
Posted by $entry.creator.screenName
More Cache Size Benchmarks
01 January 2010
Why does increasing cache size slow down time series access so much?
bash-3.2$ ./tst_ar4 -h pr_A1_256_128_128.nc
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
256 128 128 0.5 0 0 217 2773
256 128 128 1.0 0 0 214 1935
256 128 128 4.0 0 0 214 1929
256 128 128 32.0 0 0 160 84440
256 128 128 128.0 0 0 129 82407
Posted by $entry.creator.screenName
NetCDF-4 AR-4 Performance Data With One Time Series
31 December 2009
Another concern that Russ and Dennis and I had was that by taking 5 time
series, we were involving the cache too much. Might the data not be
already pre-loaded after the first time series is retrieved?
So in this run I do one one time series read. This is all on the 3D precip flux data that Gary gave us.
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
0 0 0 0 0 0 247 5073
256 64 128 4 0 0 238 2162
256 64 128 32 0 0 172 47050
256 64 128 128 0 0 165 42516
256 64 256 4 0 0 89 2061
256 64 256 32 0 0 136 83352
256 64 256 128 0 0 112 80282
256 128 128 4 0 0 217 2119
256 128 128 32 0 0 153 83400
256 128 128 128 0 0 128 80055
256 128 256 4 0 0 78 2250
256 128 256 32 0 0 188 188474
256 128 256 128 0 0 133 177679
1024 64 128 4 0 0 230 1952
1024 64 128 32 0 0 108690 52045
1024 64 128 128 0 0 216 49018
1024 64 256 4 0 0 88 2100
1024 64 256 32 0 0 87 1964
1024 64 256 128 0 0 175 95524
1024 128 128 4 0 0 218 2064
1024 128 128 32 0 0 218 1991
1024 128 128 128 0 0 194 95595
1024 128 256 4 0 0 76 2128
1024 128 256 32 0 0 76 2041
1024 128 256 128 0 0 198 197824
1560 64 128 4 0 0 229 1973
1560 64 128 32 0 0 229 1915
1560 64 128 128 0 0 161078 37119
1560 64 256 4 0 0 87 2178
1560 64 256 32 0 0 87 2105
1560 64 256 128 0 0 160058 72695
1560 128 128 4 0 0 214 2048
1560 128 128 32 0 0 213 1980
1560 128 128 128 0 0 159984 73765
1560 128 256 4 0 0 78 2284
1560 128 256 32 0 0 76 1954
1560 128 256 128 0 0 76 1947
Sorry about the alignment of the columns, but they show up fine for me
in emacs. Obviously this fancy web technology is only a few years away
from doing what emacs could do in 1981...
Posted by $entry.creator.screenName
NetCDF-4 AR-4 Performance Data With Horizonatal and Time Series Reversed
31 December 2009
Russ, Dennis and I discussed some of the chunking results yesterday. We
were concerned that the horizontal reads were causing all the data to be
preloaded into cache for the subsequent time series read. So I swapped
the order - now the time series read is done first.
Here's some results. The first line are the results for reading a classic netCDF file.
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
0 0 0 0 0 0 247 5908
256 64 128 4 0 0 241 2039
256 64 128 32 0 0 168 31384
256 64 128 128 0 0 140 17096
256 64 256 4 0 0 93 2548
256 64 256 32 0 0 136 55722
256 64 256 128 0 0 106 26892
256 128 128 4 0 0 216 2035
256 128 128 32 0 0 152 55488
256 128 128 128 0 0 121 26698
256 128 256 4 0 0 79 2392
256 128 256 32 0 0 188 191120
256 128 256 128 0 0 136 186396
1024 64 128 4 0 0 236 1945
1024 64 128 32 0 0 108356 53812
1024 64 128 128 0 0 220 19551
1024 64 256 4 0 0 89 1930
1024 64 256 32 0 0 89 1864
1024 64 256 128 0 0 209 40942
1024 128 128 4 0 0 222 2065
1024 128 128 32 0 0 220 1833
1024 128 128 128 0 0 227 41183
1024 128 256 4 0 0 77 1894
1024 128 256 32 0 0 76 1839
1024 128 256 128 0 0 199 207533
1560 64 128 4 0 0 234 1885
1560 64 128 32 0 0 233 1850
1560 64 128 128 0 0 161596 14921
1560 64 256 4 0 0 88 1969
1560 64 256 32 0 0 87 1929
1560 64 256 128 0 0 160939 30848
1560 128 128 4 0 0 218 1924
1560 128 128 32 0 0 218 1875
1560 128 128 128 0 0 161316 30876
1560 128 256 4 0 0 77 1857
1560 128 256 32 0 0 76 1797
1560 128 256 128 0 0 76 1796
Again, there are many chunk size selections which beat the classic netCDF file performance.
Posted by $entry.creator.screenName