Hyperslab access timings for netCDFx, netCDF, and niche

Hi,

The timings I sent out last Friday were flawed because some of the libraries
had been compiled with -g and others with -O.  Appended to this note is a
complete set of timings with everything compiled with -O optimization.
These timings are the results of a benchmarking program which will probably
be included in the nctest directory of the next distribution, to give
developers on other platforms something to use for tuning.

Chris, I've put a copy of the benchmark program, which is standalone, in
~ftp/pub/netcdf/nctime.c, in case you want to use it to explore the HDF
prototype performance.  As is apparent from the table, there are still a few
instances where the HDF implementation is significantly faster than the
current netCDF implementation, but in most case the netCDF implementation is
significantly faster, and in many cases the new UNIX-specific optimization
is much faster than the current netCDF implementation.

The performance comparisons below are for netcdfx (netCDF 2.02 with the
unreleased UNIX-specific optimization for netCDF), netcdf (the current
release, 2.02), and niche (netCDF interface covering HDF encoding).  Runs
were made on an unloaded SPARCstation 2 (buddy.unidata.ucar.edu).  All tests
were compiled with -O using Suns unbundled compiler in /usr/lang/cc.
Timings are the sum of user and system times as returned by getrusage(2) of
enough repetitions of each test to exceed one second of elapsed time.

The first column describes the test, where "ncvarget 10x1x30x1" means
ncvarget was called to retrieve a 10 by 1 by 30 by 1 hyperslab of the 10 by
20 by 30 by 40 variable.  The first dimension (10) is a record variable, and
varies most slowly.  The benchmark program permits any shape of variable to
be used, but 10x20x30x40 was deemed typical.

The value in the netcdfx column is the elapsed time in milliseconds for the
described test using the netcdfx library.  The netcdfx/netcdf column
contains the ratio of times for the netcdfx library over the times for the
released netcdf library.  The last netcdfx/niche column is the ratio of
times for the new library and the niche library.  Hence values less than 1.0
in the last two columns are expected where the new library performs better
than the previous version or the niche library.  Ratios significantly
greater than 1.0 in either of the last two columns indicate a possible
performance problem with the netcdfx library that may bear further study.

The same accesses were made for each of the six netCDF types, byte, char,
short, long, float, and double.  All the timings for the byte variable
appear first, followed by the other types.  The very first call of ncvarput
for the byte variable also includes the time needed to write fill values
for the other variables of the five other types.  I don't know whether the
niche library wrote all these fill values in this case, since nctest doesn't
test that.

                                    netcdfx   netcdfx/netcdf  netcdfx/niche
----- byte_var(10,20,30,40)                                                    
time for ncvarput 10x20x30x40   2936.667 msec    .70593      5.68387
time for ncvarget 1x1x1x1           .065 msec    .0715859     .0374208
time for ncvarget 10x1x1x1        13.256 msec   1.14771       .769311
time for ncvarget 1x20x1x1         4.163 msec    .20345       .134686
time for ncvarget 1x1x30x1          .649 msec    .021417      .0144711
time for ncvarget 1x1x1x40          .104 msec    .0881356     .0579387
time for ncvarget 10x20x1x1       53.030 msec    .279105      .197873
time for ncvarget 10x1x30x1       18.615 msec    .0732874     .0470076
time for ncvarget 10x1x1x40       13.488 msec   1.16779       .775841
time for ncvarget 1x20x30x1       15.846 msec    .0389655     .02502
time for ncvarget 1x20x1x40        4.747 msec    .23375       .1551
time for ncvarget 1x1x30x40        1.268 msec   1.02341       .698623
time for ncvarget 10x20x30x1     148.889 msec    .0353936     .0225476
time for ncvarget 10x20x1x40      57.879 msec    .302855      .206711
time for ncvarget 10x1x30x40      24.615 msec   1.9846        .517383
time for ncvarget 1x20x30x40      27.385 msec   6.06668      3.18245
time for ncvarget 10x20x30x40    232.000 msec   5.07015       .54375
                                                                            
----- char_var(10,20,30,40)                                                 
time for ncvarput 10x20x30x40    270.000 msec   7.68093       .52258
time for ncvarget 1x1x1x1           .065 msec    .0715859     .0413749
time for ncvarget 10x1x1x1        13.333 msec   1.14663       .833313
time for ncvarget 1x20x1x1         5.525 msec    .270013      .170398
time for ncvarget 1x1x30x1         3.099 msec    .106018      .0635197
time for ncvarget 1x1x1x40          .101 msec    .0885188     .0550709
time for ncvarget 10x20x1x1       52.424 msec    .274312      .213106
time for ncvarget 10x1x30x1       20.769 msec    .0817677     .060375
time for ncvarget 10x1x1x40       13.566 msec   1.16667       .881768
time for ncvarget 1x20x30x1       16.923 msec    .0419579     .0270048
time for ncvarget 1x20x1x40        6.381 msec    .314211      .206445
time for ncvarget 1x1x30x40        3.723 msec   3.07686      1.97716
time for ncvarget 10x20x30x1     153.333 msec    .0363348     .0267753
time for ncvarget 10x20x1x40      57.576 msec    .299529      .237917
time for ncvarget 10x1x30x40      26.923 msec   2.19815       .710763
time for ncvarget 1x20x30x40      28.308 msec   6.49564      3.44505
time for ncvarget 10x20x30x40    232.000 msec   5.0368        .66923
                                                                            
----- short_var(10,20,30,40)                                                
time for ncvarput 10x20x30x40    580.000 msec    .865672      .604167
time for ncvarget 1x1x1x1           .067 msec    .0618652     .0361381
time for ncvarget 10x1x1x1        13.023 msec   1.12753       .829913
time for ncvarget 1x20x1x1         9.457 msec    .448688      .302992
time for ncvarget 1x1x30x1         3.099 msec    .100262      .0691001
time for ncvarget 1x1x1x40          .160 msec    .130187      .0970285
time for ncvarget 10x20x1x1       85.882 msec    .44168       .338118
time for ncvarget 10x1x30x1       21.385 msec    .0835352     .0600702
time for ncvarget 10x1x1x40       14.031 msec   1.13833       .88546
time for ncvarget 1x20x30x1       21.538 msec    .0525317     .0338293
time for ncvarget 1x20x1x40       11.163 msec    .486978      .354212
time for ncvarget 1x1x30x40        5.525 msec   1.37849      2.6247
time for ncvarget 10x20x30x1     192.222 msec    .0451579     .0316675
time for ncvarget 10x20x1x40     102.941 msec    .465563      .389928
time for ncvarget 10x1x30x40      44.848 msec   1.08028       .747467
time for ncvarget 1x20x30x40      65.294 msec   1.03738      4.16097
time for ncvarget 10x20x30x40    460.000 msec   1.05343       .663462
                                                                            
----- long_var(10,20,30,40)                                                 
time for ncvarput 10x20x30x40   1183.333 msec    .698819      .606837
time for ncvarget 1x1x1x1           .063 msec    .0492958     .0319959
time for ncvarget 10x1x1x1        12.403 msec   1.03221       .650155
time for ncvarget 1x20x1x1        15.385 msec    .662262      .588247
time for ncvarget 1x1x30x1         2.924 msec    .0918977     .0810848
time for ncvarget 1x1x1x40          .273 msec    .192933      .197112
time for ncvarget 10x20x1x1      141.111 msec    .671957      .443745
time for ncvarget 10x1x30x1       24.923 msec    .0929963     .0685955
time for ncvarget 10x1x1x40       14.186 msec   1.02812       .743618
time for ncvarget 1x20x30x1       27.077 msec    .0634617     .054154
time for ncvarget 1x20x1x40       19.231 msec    .698344      .722562
time for ncvarget 1x1x30x40        8.915 msec   1.15719      3.97636
time for ncvarget 10x20x30x1     218.000 msec    .0506584     .0285964
time for ncvarget 10x20x1x40     176.667 msec    .795797      .474911
time for ncvarget 10x1x30x40      80.588 msec   1.0458        .636219
time for ncvarget 1x20x30x40     131.111 msec   1.00855      4.46183
time for ncvarget 10x20x30x40    976.667 msec    .996599      .57115
                                                                            
----- float_var(10,20,30,40)                                                
time for ncvarput 10x20x30x40   1140.000 msec    .675889      .552504
time for ncvarget 1x1x1x1           .065 msec    .0555081     .0292529
time for ncvarget 10x1x1x1        12.558 msec   1.03845       .618377
time for ncvarget 1x20x1x1        15.194 msec    .658404      .460006
time for ncvarget 1x1x30x1         3.041 msec    .0974305     .0664583
time for ncvarget 1x1x1x40          .271 msec    .195668      .16247
time for ncvarget 10x20x1x1      138.889 msec    .657896      .439522
time for ncvarget 10x1x30x1       25.077 msec    .0921949     .0671706
time for ncvarget 10x1x1x40       14.651 msec   1.03278       .71601
time for ncvarget 1x20x30x1       26.615 msec    .0618953     .0405304
time for ncvarget 1x20x1x40       19.231 msec    .710234      .576936
time for ncvarget 1x1x30x40        8.837 msec   1.1588       3.51372
time for ncvarget 10x20x30x1     216.000 msec    .0497314     .0283217
time for ncvarget 10x20x1x40     176.667 msec    .788692      .48803
time for ncvarget 10x1x30x40      79.412 msec   1.04652       .655696
time for ncvarget 1x20x30x40     128.889 msec   1            4.16995
time for ncvarget 10x20x30x40    966.667 msec   1.00694       .587045
                                                                            
----- double_var(10,20,30,40)                                               
time for ncvarput 10x20x30x40   2256.667 msec    .674975      .485305
time for ncvarget 1x1x1x1           .070 msec    .0623886     .033557
time for ncvarget 10x1x1x1        13.333 msec   1.13909       .637241
time for ncvarget 1x20x1x1        26.769 msec   1.12258       .817948
time for ncvarget 1x1x30x1         3.528 msec    .114142      .0771013
time for ncvarget 1x1x1x40          .481 msec    .306174      .266482
time for ncvarget 10x20x1x1      216.000 msec   1.0125        .624277
time for ncvarget 10x1x30x1       35.152 msec    .136248      .0886184
time for ncvarget 10x1x1x40       18.462 msec   1.2           .827595
time for ncvarget 1x20x30x1       44.545 msec    .105224      .0707063
time for ncvarget 1x20x1x40       33.636 msec   1.0673        .880939
time for ncvarget 1x1x30x40       15.116 msec   1.08936      1.91172
time for ncvarget 10x20x30x1     370.000 msec    .0854503     .0448666
time for ncvarget 10x20x1x40     290.000 msec   1.05072       .763158
time for ncvarget 10x1x30x40     137.778 msec   1.0248        .656086
time for ncvarget 1x20x30x40     230.000 msec   1.03604      1.95283
time for ncvarget 10x20x30x40   1890.000 msec   1.03091       .579162