The cache can really mess up benchmarking!
bash-3.2$ sudo bash clear_cache.sh && ./tst_ar4_3d -h -c cs cs cs cache(MB) deflate shuffle read_hor(us) read_time_ser(us) 64 256 128 4.0 0 0 66 2102
bash-3.2$ sudo bash clear_cache.sh && ./tst_ar4_3d -h cs cs cs cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64 256 128 4.0 0 0 1859 2324282
In the first run of tst_ar4_3d, with the -c option, the sample data file
is first created and then read. The read time for the time series read
is really low, because the file (having just been created) is still
loaded in a disk cache somewhere in the OS or in the disk hardware.
When I clear the cache and rerun without the -c option, the sample data file is not created, it is assumed to already exist. Since the cache has been cleared, the time series read has to read the data from disk, and it takes 1000 times longer.
Well, that's why they invented disk caches.
This leads me to believe that my horizontal read times are fake too, because first I am doing a time series read, those loading some or all of the file into cache. I need to break that out into a separate test, I see, or perhaps make the order of the two tests controllable from the command line.
Oy, this benchmarking stuff is tricky business! I thought I had found some really good performance for netCDF-4, but now I am not sure. I need to look again more carefully and make sure that I am not being faked out by the caches.