[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: netcdf on cray



> Keywords: 199401310358.AA16141

Hi Paul,

> i have been getting complaints that our netcdf interface for the australian
> global model has created quite a bit of CPU overhead.  this is probably due
> to the xdr interface in the C lib of the cray.. is there a vectorised version
> around somewhere???

Sorry to take so long to respond, but I'm afraid I don't know of a
vectorized version.  Here's an earlier reply that had some additional info
on this:

Date: Tue, 23 Nov 1993 08:31:32 -0700
From: Russ Rew <address@hidden>
Message-Id: <address@hidden>
To: address@hidden (William Weibel)
In-Reply-To: <address@hidden> (address@hidden)
Subject: Re: Efficiency on a CRAY

> Organization: Department of Atmospheric Sciences, UCLA
> Keywords: 199311230110.AA28394

Hi William,

> I am converting CRAY floating-point binaries to NetCDF's, but it is very
> expensive.  For example, A loop of 6 ncvarput's (FORTRAN binding) processing
> a total of 0.5 Megawords costs about 10 CPU seconds.  I suspect that function
> calls to XDR are inhibiting vectorization.  Has anyone dealt with this 
> problem before?

I've appended a note posted earlier to the netcdfgroup mailing list (from
sci.data.formats) concerning inefficiencies of XDR on Crays that may be of
some help.  We've never gotten this code and folded it into the
distribution, because we heard that Cray was working on vectorizing the XDR
libraries, but apparently that hasn't happened.

----------------------------------------------------------------------------
Russell K. Rew                                          UCAR Unidata Program
address@hidden                                          P.O. Box 3000
                                                      Boulder, CO 80307-3000
----------------------------------------------------------------------------

Date: Fri, 18 Jun 93 09:12:31 -0400
From: address@hidden (Richard P. Signell)
To: address@hidden
Subject: netCDF performance on CRAY

For all of you who don't follow sci.data.formats, I'm passing this along
as I found it *extremely* interesting.  I would like to hear if anyone 
has patches to the netCDF code for the CRAY that would implement the 
following suggestion for 100 fold improvement in writing floats to disk.

In article <1vasu8$address@hidden> address@hidden  writes:

   In article <address@hidden.
sc.edu>, address@hidden (Benjamin Z. Goldsteen) writes:
   |> address@hidden ( Don Dovey ) writes:
   |> 
   |> >Are these timings in the right range, and does netCDF (using Sun's
   |> >XDR) have a similar performance on the Cray?
   |> 
   |> >A factor of one hundred would impact the I/O performance of our
   |> >analysis codes.

>   At the Stanford Exploration Project, Dave Nichols has rewritten some of the
>   xdr ieee float conversion routines in order to get acceptable performance
>   on converting large volumes (100's of megabytes) of seismic data. The basic
>   xdr package handles data a byte at a time, calling several layers of 
>   subroutines  to retrieve, assemble, and convert each data item of a 
>   vector.  This is where  the factor of 100 comes in, I believe. 
>   I do not know how much Cray has  optimized their implementation.

Actually I didn't rewrite the xdr .
routines I just changed what our own
I/O routines called.

If you do I/O to a "FILE*" the standard, portable, xdr distribution
converts one float at a time and then uses putc() 4 times to write the
bytes. On many systems this cost in not significant compared to the
calculation and I/O time, on the Cray you really notice it.

My first attempt to overcome this was to read large blocks of data myself
and then use the xdr routines to convert from memory to memory. This 
improves the speed acceptably on some systems. On the Cray this is no good
because the xdr routines aren't vectorized so I replaced them with the
Cray library routines (IEG2CRAY and CRAY2IEG). I dislike having special
cases in the code but the effort was worth it this time.

Here are some approximate I/O rates for writing 10M of floats to disk on a YMP.
It is writing through the SSD so I/O rates are pretty good.

Raw I/O of cray floats 10MW floats = 80MB
    ~1.4 MW/s

I/O in ieee format using xdr_vector() to an XDR stream that uses FILE* I/O
10MW floats = 40MB in ieee format.
    ~0.14 MW/s !

I/O in ieee format usi.
ng xdr_vector() to an XDR stream that writes to memory
then raw I/O to disk.
    ~0.16 MW/s 

I/O in ieee format using using CRAY2IEG to convert data and then raw I/O
    ~1.3 MW/s

I am prepared to pay a 10% penalty for having it in ieee format,
especially since it takes half the space, but I am not prepared to pay
a 1,000% penalty. If you want fast I/O in portable format from a Cray
you seem to need to use their conversion routines.
I don't know if the netCDF guys have made any optimizations like this.

In an ideal world Cray would modify their library version of the xdr
routines to use the vectorised conversion routines. Then we would
have the nice uniform xdr interface and reasonable performance. 

-- 
Dave Nichols, Dept. of Geophysics, Stanford University.
address@hidden