timings of netcdf vs. niche

Hi,

I've appended the output of a benchmark program that times hyperslab
accesses for netCDF data.  I'm only including the results for
floating-point, since the results for variables of other types were
analogous.  However, if you want to see the full benchmark results, you can
look in the files

        ~russ/hdf/netcdf-timings
        ~russ/hdf/niche-timings

The source for the program that produces the timings is in

        ~russ/sdmsrc/netcdf/nctest/timeit.c

These timings are for a four-dimensional variable of size 10x20x30x40, where
the last dimension varies fastest and the first dimension is the unlimited
dimension.  The time is a sum of the user and system time, as returned from
the times(3) function.  The clock resolution is not as accurate as these
numbers look, but each test was run enough times for at least one second to
have elapsed.

These timings seem to show that niche does better than netCDF in cases where
contiguous data is accessed (e.g. in accessing the 20x30x40 cube in each
record it is about 3.9 times as fast), but for other kinds of hyperslab
access that cross record boundaries the performance is significantly
degraded (e.g. in accessing the 10x20x30 cube of values that are not
contiguous anywhere it is about 35 times as slow).

--Russ

                                   netCDF
----- float_var(10,20,30,40)
time for ncvarput 240000 values     1111111.1 usec,        1/sec
time for ncvarget 1 point                83.4 usec,    11989/sec
time for ncvarget 10x1x1x1 vector     12532.3 usec,       80/sec
time for ncvarget 1x20x1x1 vector     15116.3 usec,       66/sec
time for ncvarget 1x1x30x1 vector      2924.0 usec,      342/sec
time for ncvarget 1x1x1x40 vector       284.8 usec,     3512/sec
time for ncvarget 10x20x1x1 plane    137037.0 usec,        7/sec
time for ncvarget 10x1x30x1 plane     25128.2 usec,       40/sec
time for ncvarget 10x1x1x40 plane     14857.9 usec,       67/sec
time for ncvarget 1x20x30x1 plane     26923.1 usec,       37/sec
time for ncvarget 1x20x1x40 plane     18974.4 usec,       53/sec
time for ncvarget 1x1x30x40 plane      8527.1 usec,      117/sec
time for ncvarget 10x20x30x1 cube    216666.7 usec,        5/sec
time for ncvarget 10x20x1x40 cube    181481.5 usec,        6/sec
time for ncvarget 10x1x30x40 cube     77451.0 usec,       13/sec
time for ncvarget 1x20x30x40 cube    124074.1 usec,        8/sec

                                   niche
----- float_var(10,20,30,40)
time for ncvarput 240000 values      827777.8 usec,        1/sec
time for ncvarget 1 point              2079.3 usec,      481/sec
time for ncvarget 10x1x1x1 vector     19487.2 usec,       51/sec
time for ncvarget 1x20x1x1 vector     32323.2 usec,       31/sec
time for ncvarget 1x1x30x1 vector     46969.7 usec,       21/sec
time for ncvarget 1x1x1x40 vector      1788.6 usec,      559/sec
time for ncvarget 10x20x1x1 plane    320000.0 usec,        3/sec
time for ncvarget 10x1x30x1 plane    377777.8 usec,        3/sec
time for ncvarget 10x1x1x40 plane     19743.6 usec,       51/sec
time for ncvarget 1x20x30x1 plane    666666.7 usec,        2/sec
time for ncvarget 1x20x1x40 plane     33333.3 usec,       30/sec
time for ncvarget 1x1x30x40 plane      2501.6 usec,      400/sec
time for ncvarget 10x20x30x1 cube   7688889.2 usec,        0/sec
time for ncvarget 10x20x1x40 cube    373333.3 usec,        3/sec
time for ncvarget 10x1x30x40 cube    124074.1 usec,        8/sec
time for ncvarget 1x20x30x40 cube     31818.2 usec,       31/sec