Netcdf-4 Chunking Performance Results on AR-4 3D Data File

Some results from AR-5 performance evaluation

As part of analyzing netcdf-4 performance for the upcoming AR-5 climate data archive, I have been running benchmarks on some AR-4 (3D precip flux) data that I got from Gary Strand (thanks Gary!) pr_A1.20C3M_8.CCSM.atmm.1870-01_cat_1999-12.nc.

Here's what's in the file:

 netcdf pr_A1.20C3M_8.CCSM.atmm.1870-01_cat_1999-12
 {                                                                                                          
  dimensions:                                                                                                                                                      
          lon = 256 ;                                                                                                                                              
          lat = 128 ;                                                                                                                                              
          bnds = 2 ;                                                                                                                                               
          time = UNLIMITED ; // (1560 currently)                                                                                                                   
  variables:                                                                                                                                                       
          double lon_bnds(lon, bnds) ;                                                                                                                             
          double lat_bnds(lat, bnds) ;                                                                                                                             
          double time_bnds(time, bnds) ;                                                                                                                           
          double time(time) ;                                                                                                                                      
                  time:calendar = "noleap" ;                                                                                                                       
                  time:standard_name = "time" ;                                                                                                                    
                  time:axis = "T" ;                                                                                                                                
                  time:units = "days since 0000-1-1" ;                                                                                                             
                  time:bounds = "time_bnds" ;                                                                                                                      
                  time:long_name = "time" ;                                                                                                                        
          double lat(lat) ;                                                                                                                                        
                  lat:axis = "Y" ;                                                                                                                                 
                  lat:standard_name = "latitude" ;                                                                                                                 
                  lat:bounds = "lat_bnds" ;                                                                                                                        
                  lat:long_name = "latitude" ;                                                                                                                     
                  lat:units = "degrees_north" ;                                                                                                                    
          double lon(lon) ;                                                                                                                                        
                  lon:axis = "X" ;                                                                                                                                 
                  lon:standard_name = "longitude" ;                                                                                                                
                  lon:bounds = "lon_bnds" ;                                                                                                                        
                  lon:long_name = "longitude" ;                                                                                                                    
                  lon:units = "degrees_east" ;                                                                                                                     
          float pr(time, lat, lon) ;                                                                                                                               
                  pr:comment = "Created using NCL code CCSM_atmm_2cf.ncl on\n",                                                                                    
                          " machine mineral" ;                                                                                                                     
                  pr:missing_value = 1.e+20f ;                                                                                                                     
                  pr:_FillValue = 1.e+20f ;                                                                                                                        
                  pr:cell_methods = "time: mean (interval: 1 month)" ;                                                                                             
                  pr:history = "(PRECC+PRECL)*r[h2o]" ;                                                                                                            
                  pr:original_units = "m-1 s-1" ;                                                                                                                  
                  pr:original_name = "PRECC, PRECL" ;                                                                                                              
                  pr:standard_name = "precipitation_flux" ;                                                                                                        
                  pr:units = "kg m-2 s-1" ;                                                                                                                        
                  pr:long_name = "precipitation_flux" ;                                                                                                            
                  pr:cell_method = "time: mean" ;          

And here are the first results of putting this data in different sets of chunksizes, with no compression. The first I read all horizontal slabs in the file, then 5 time series. The times show the time to read each slab, and the time to read each time series, in microseconds.

cs[0]   cs[1]   cs[2]   cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
0       0       0       0         0       0       240          3822
1       16      32      1         0       0       667          57087
1       16      128     1         0       0       245          23929
1       16      256     1         0       0       160          26913
1       64      32      1         0       0       277          22840
1       64      128     1         0       0       147          41359
1       64      256     1         0       0       110          47856
1       128     32      1         0       0       205          25052
1       128     128     1         0       0       123          47417
1       128     256     1         0       0       97           68877
10      16      32      1         0       0       552          3284
10      16      128     1         0       0       204          5834
10      16      256     1         0       0       138          8465
10      64      32      1         0       0       233          5268
10      64      128     1         0       0       132          16690
10      64      256     1         0       0       99           28037
10      128     32      1         0       0       180          8414
10      128     128     1         0       0       113          28064
10      128     256     1         0       0       90           54715
256     16      32      1         0       0       8853         1167
256     16      128     1         0       0       8012         3677
256     16      256     1         0       0       118          1581
256     64      32      1         0       0       8170         3737
256     64      128     1         0       0       227          1640
256     64      256     1         0       0       80           1627
256     128     32      1         0       0       645          1624
256     128     128     1         0       0       211          1650
256     128     256     1         0       0       68           1667
1024    16      32      1         0       0       32337        1192
1024    16      128     1         0       0       296          1489
1024    16      256     1         0       0       114          1564
1024    64      32      1         0       0       679          1415
1024    64      128     1         0       0       221          1503
1024    64      256     1         0       0       79           1669
1024    128     32      1         0       0       646          1558
1024    128     128     1         0       0       208          1568
1024    128     256     1         0       0       68           1646
1560    16      32      1         0       0       55064        1055
1560    16      128     1         0       0       298          1438
1560    16      256     1         0       0       115          1477
1560    64      32      1         0       0       685          1425
1560    64      128     1         0       0       225          1545
1560    64      256     1         0       0       79           1589
1560    128     32      1         0       0       658          1535
1560    128     128     1         0       0       208          1567
1560    128     256     1         0       0       68           1544

The first line shows the read times for the classic netcdf file.

I am happy to see there are a number of cases that clearly outperform classic netcdf. The trick is to come up with some algorithm that comes up with the correct answers without the user being involved.

Comments:

Post a Comment:
  • HTML Syntax: Allowed
Unidata Developer's Blog
A weblog about software development by Unidata developers*
Unidata Developer's Blog
A weblog about software development by Unidata developers*

Welcome

FAQs

News@Unidata blog

Recent Entries:
Take a poll!

What if we had an ongoing user poll in here?

Browse By Topic
Browse by Topic
« February 2019
SunMonTueWedThuFriSat
     
1
2
3
5
6
7
8
9
10
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
  
       
Today