netCDF performance on CRAY

For all of you who don't follow sci.data.formats, I'm passing this along
as I found it *extremely* interesting.  I would like to hear if anyone 
has patches to the netCDF code for the CRAY that would implement the 
following suggestion for 100 fold improvement in writing floats to disk.

In article <1vasu8$2ve@xxxxxxxxxxxxxxxxxxxx> salevin@xxxxxxxxxxxxx  writes:

   In article <C8G149.6yL@xxxxxxxxxxxxxx>, ben@xxxxxxxxxxxxxx (Benjamin Z. 
Goldsteen) writes:
   |> dovey@xxxxxxxxxxxxxxxx ( Don Dovey ) writes:
   |> 
   |> >Are these timings in the right range, and does netCDF (using Sun's
   |> >XDR) have a similar performance on the Cray?
   |> 
   |> >A factor of one hundred would impact the I/O performance of our
   |> >analysis codes.

>   At the Stanford Exploration Project, Dave Nichols has rewritten some of the
>   xdr ieee float conversion routines in order to get acceptable performance
>   on converting large volumes (100's of megabytes) of seismic data. The basic
>   xdr package handles data a byte at a time, calling several layers of 
>   subroutines  to retrieve, assemble, and convert each data item of a 
>   vector.  This is where  the factor of 100 comes in, I believe. 
>   I do not know how much Cray has  optimized their implementation.

Actually I didn't rewrite the xdr routines I just changed what our own
I/O routines called.

If you do I/O to a "FILE*" the standard, portable, xdr distribution
converts one float at a time and then uses putc() 4 times to write the
bytes. On many systems this cost in not significant compared to the
calculation and I/O time, on the Cray you really notice it.

My first attempt to overcome this was to read large blocks of data myself
and then use the xdr routines to convert from memory to memory. This 
improves the speed acceptably on some systems. On the Cray this is no good
because the xdr routines aren't vectorized so I replaced them with the
Cray library routines (IEG2CRAY and CRAY2IEG). I dislike having special
cases in the code but the effort was worth it this time.

Here are some approximate I/O rates for writing 10M of floats to disk on a YMP.
It is writing through the SSD so I/O rates are pretty good.

Raw I/O of cray floats 10MW floats = 80MB
    ~1.4 MW/s

I/O in ieee format using xdr_vector() to an XDR stream that uses FILE* I/O
10MW floats = 40MB in ieee format.
    ~0.14 MW/s !

I/O in ieee format using xdr_vector() to an XDR stream that writes to memory
then raw I/O to disk.
    ~0.16 MW/s 

I/O in ieee format using using CRAY2IEG to convert data and then raw I/O
    ~1.3 MW/s

I am prepared to pay a 10% penalty for having it in ieee format,
especially since it takes half the space, but I am not prepared to pay
a 1,000% penalty. If you want fast I/O in portable format from a Cray
you seem to need to use their conversion routines.
I don't know if the netCDF guys have made any optimizations like this.

In an ideal world Cray would modify their library version of the xdr
routines to use the vectorised conversion routines. Then we would
have the nice uniform xdr interface and reasonable performance. 

-- 
Dave Nichols, Dept. of Geophysics, Stanford University.
dave@xxxxxxxxxxxxxxxx
-- 
Rich Signell               |  rsignell@xxxxxxxxxxxxxxxxxx
U.S. Geological Survey     |  (508) 457-2229  |  FAX (508) 457-2310
Quissett Campus            |  " When marriage is outlawed, 
Woods Hole, MA  02543      |    only outlaws will have inlaws. "