netcdf performance

> Russ writes:
> Date: Tue, 21 Apr 1992 09:18:50 -0600
> From: Russ Rew <russ@xxxxxxxxxxxxxxxx>
> To: mfolk@xxxxxxxxxxxxx
> Cc: netcdf-hdf@xxxxxxxxxxxxxxxx
> Subject: Comments on netCDF/HDF Draft Design
>            The performance of netCDF in some common applications relates
> more to the stdio layer below XDR than to XDR: the buffering scheme of stdio
> is not optimal for styles of access used by netCDF.  We have evidence that
> this can be fixed without abandoning XDR or the advantages of a single
> external representation.

To elaborate on this further.

In response to some profiling Steve Emmerson did of the netcdf operator
'ncbarne', I recently examined the i/o performance of netcdf.
It turns out that a significant percentage of the time in a simple "beginning
to end" read of a netcdf is being spent in the system call 'lseek' to
determine the current position. We had previously added a performance
tweak which eliminates 'fseeks' to the current position.

The analysis brings home the point stated above: "the buffering scheme of stdio
is is not optimal for styles of access used by netCDF". Also, the stdio
data structure (struct _iob == FILE ) does not contain the information
we would want to fix this.

Recall that there are several "subclasses" of the "abstract base class" XDR,
each constructed by different 'create' call:
        xdr_memcreate - in memory 
        xdr_reccreat - buffered stream used by RPC over tcp
        xdr_stdiocreate - used by netcdf

Fairly early on in netcdf, Dave Lucas had severe performance problems
with xdr_stdio on the Mac. He implemented an XDR "subclass" which
made Mac toolbox calls instead of stdio calls and achieved a substantial
performance gain. (Available in Spyglass Dicer).

Growing sick of kludging around stdio, I recently implemented a
simple buffered io XDR subclass built directly on UNIX system calls.
This involves adding a new module to implement the "subclass" and
modifications to netcdf/src/file.c.  The performance improvement is
significant: 40% in 'ncbarne' which is a compute
intensive operator.

We haven't decided how or whether to release this, since it is unix only.
In any case, we are willing to make this available to you to show the
potential and method.

Historically, we have focused on having a single source distribution for
all the platforms we support. In doing so, we have ignored and NOT released
even "obvious" performance enhancing code varients for specific OS's
or platforms. The point is that there is a lot of room for performance
improvement without sacrificing the benefits of XDR and the current
file format.