Showing entries tagged [performance]

Demonstrating Caching and Its Effect on Timing

The cache can really mess up benchmarking!

For example:

bash-3.2$ sudo bash clear_cache.sh && ./tst_ar4_3d -h -c
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64    256   128   4.0       0       0       66           2102
bash-3.2$ sudo bash clear_cache.sh && ./tst_ar4_3d -h cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64    256   128   4.0       0       0       1859         2324282

In the first run of tst_ar4_3d, with the -c option, the sample data file is first created and then read. The read time for the time series read is really low, because the file (having just been created) is still loaded in a disk cache somewhere in the OS or in the disk hardware.

When I clear the cache and rerun without the -c option, the sample data file is not created, it is assumed to already exist. Since the cache has been cleared, the time series read has to read the data from disk, and it takes 1000 times longer.

Well, that's why they invented disk caches.

This leads me to believe that my horizontal read times are fake too, because first I am doing a time series read, those loading some or all of the file into cache. I need to break that out into a separate test, I see, or perhaps make the order of the two tests controllable from the command line.

Oy, this benchmarking stuff is tricky business! I thought I had found some really good performance for netCDF-4, but now I am not sure. I need to look again more carefully and make sure that I am not being faked out by the caches.

Ed

Effects of Clearing the Cache on Benchmarks

How to win friends and influence benchmarks...

I note that I have a shell in my nc_test4 directory, clear_cache.sh. I have to sudo to run it, but when I do, it has a dramatic effect on the time that the time series read takes.

The following uses the new (not yet checked in) test program tst_ar4_3d.c, which seeks to set up a simpler proxy data file for the AR-4 tests. I want to show that a simpler file (but with the same-sized data variable) has similar performance to the slightly more dressed up pr_A1 file from AR-4 that I got from Gary. That's because my simpler file is easier to create in a test program.

bash-3.2$ ./tst_ar4_3d -h 
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64    256   128   4.0       0       0       1420         2281847
bash-3.2$ ./tst_ar4_3d -h cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us) 64    256   128   4.0       0       0       81           3159
bash-3.2$ ./tst_ar4_3d -h cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us) 64    256   128   4.0       0       0       76           2983
bash-3.2$ sudo bash clear_cache.sh
bash-3.2$ ./tst_ar4_3d -h cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us) 64    256   128   4.0       0       0      1410         2504315

Wow, what a difference a cleared cache makes!

Here's the clear_cache.sh script:

#!/bin/bash -x 
# Clear the disk caches.
sync
echo 3 > /proc/sys/vm/drop_caches

More Cache Size Benchmarks

Why does increasing cache size slow down time series access so much?

bash-3.2$ ./tst_ar4 -h pr_A1_256_128_128.nc
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
256   128   128   0.5       0       0       217          2773
256   128   128   1.0       0       0       214          1935
256   128   128   4.0       0       0       214          1929
256   128   128   32.0      0       0       160          84440
256   128   128   128.0     0       0       129          82407

NetCDF-4 AR-4 Performance Data With One Time Series

Another concern that Russ and Dennis and I had was that by taking 5 time series, we were involving the cache too much. Might the data not be already pre-loaded after the first time series is retrieved?

So in this run I do one one time series read. This is all on the 3D precip flux data that Gary gave us.

cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
0     0     0     0       0       0       247          5073
256   64    128   4        0       0       238          2162
256   64    128   32        0       0       172          47050
256   64    128   128       0       0       165          42516
256   64    256  4         0       0       89           2061
256   64    256   32        0       0       136          83352
256   64    256   128       0       0       112          80282
256   128   128   4         0       0       217          2119
256   128   128   32        0       0       153          83400
256   128   128   128       0       0       128          80055
256   128  256   4         0       0       78           2250
256   128  256   32        0       0       188          188474
256   128   256   128       0       0       133          177679
1024  64    128   4         0       0       230          1952
1024  64    128   32        0       0       108690       52045
1024  64    128   128       0       0       216          49018
1024  64    256   4         0       0       88           2100
1024  64    256   32        0       0       87           1964
1024  64    256   128       0       0       175          95524
1024  128   128   4         0       0       218          2064
1024  128   128   32        0       0       218          1991
1024 128   128   128       0       0       194          95595
1024  128   256   4         0       0       76           2128
1024  128   256   32        0       0       76           2041
1024  128   256   128       0       0       198          197824
1560  64    128   4         0       0       229          1973
1560  64    128   32        0       0       229          1915
1560  64    128   128       0       0       161078       37119
1560  64    256   4         0       0       87           2178
1560  64    256   32        0       0       87           2105
1560  64    256   128       0       0       160058       72695
1560  128   128   4         0       0       214          2048
1560  128   128   32        0       0       213          1980
1560  128   128   128       0       0       159984       73765
1560  128   256   4         0       0       78           2284
1560  128   256   32        0       0       76           1954
1560  128   256   128       0       0       76           1947

Sorry about the alignment of the columns, but they show up fine for me in emacs. Obviously this fancy web technology is only a few years away from doing what emacs could do in 1981...

NetCDF-4 AR-4 Performance Data With Horizonatal and Time Series Reversed

Russ, Dennis and I discussed some of the chunking results yesterday. We were concerned that the horizontal reads were causing all the data to be preloaded into cache for the subsequent time series read. So I swapped the order - now the time series read is done first.

Here's some results. The first line are the results for reading a classic netCDF file.

cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us)  read_time_ser(us)
0     0     0     0         0      0        247           5908
256   64    128   4         0      0        241           2039
256   64    128   32        0      0        168           31384
256   64    128   128       0      0        140           17096
256   64    256   4         0      0        93            2548
256   64    256   32        0      0        136           55722
256   64    256   128       0      0        106           26892
256   128   128   4         0      0        216           2035
256   128   128   32        0      0        152           55488
256   128   128   128       0      0        121           26698
256   128   256   4         0      0        79            2392
256   128   256   32        0      0        188           191120
256   128   256   128       0      0        136           186396
1024  64    128   4         0      0        236           1945
1024  64    128   32        0      0        108356        53812
1024  64    128   128       0      0        220           19551
1024  64    256   4         0      0        89            1930
1024  64    256   32        0      0        89            1864
1024  64    256   128       0      0        209           40942
1024  128   128   4         0      0        222           2065
1024  128   128   32        0      0        220           1833
1024  128   128   128       0      0        227           41183
1024  128   256   4         0      0        77            1894
1024  128   256   32        0      0        76            1839
1024  128   256   128       0      0        199           207533
1560  64    128   4         0      0        234           1885
1560  64    128   32        0      0        233           1850
1560  64    128   128       0      0        161596        14921
1560  64    256   4         0      0        88            1969
1560  64    256   32        0      0        87            1929
1560  64    256   128       0      0        160939        30848
1560  128   128   4         0      0        218           1924
1560  128   128   32        0      0        218           1875
1560  128   128   128       0      0        161316        30876
1560  128   256   4         0      0        77            1857
1560  128   256   32        0      0        76            1797
1560  128   256   128       0      0        76            1796

Again, there are many chunk size selections which beat the classic netCDF file performance.

Unidata Developer's Blog
A weblog about software development by Unidata developers*
Unidata Developer's Blog
A weblog about software development by Unidata developers*

Welcome

FAQs

News@Unidata blog

Take a poll!

What if we had an ongoing user poll in here?

Browse By Topic
Browse by Topic
« May 2024
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today