[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: netcdf and crays



> Organization: NCAR/CGD
> Keywords: 199408280439.AA01442 netCDF Cray
> From: address@hidden (Phil Rasch)

Hi Phil,

> Over the last year or so I have seen 5 or 10 messages within the
> discussion group lamenting the inefficiencies of using the XDR package
> on the crays to do the ieee conversions within the netcdf package, and
> a few proposed solutions.  Unfortunately, I have not yet seen anyone
> advertise a real IMPLEMENTED solution. I am now being bitten
> personally by this problem. Before I delve into it myself, or put my
> programmer on it, I thought I would ask your advice. So my questions
> are:
> 
> 1) Do you know somebody who has implemented a fast way of writing
> floats on the cray under netcdf?

On the Cray 3, yes; but we know of no complete implemented solution on other
Crays.  I have appended notes about the Cray 3 solution as part of a mail
exchange with Chris Anderson from NERSC.

> 2) If not, why not? Is it a really hard problem? Where are the
> bottlenecks? 

I've wondered the same thing.  It seems like it should be fairly easy to
replace the code that accesses an array of floats using a single call per
floating-point value of xdrfloat() by a call to some vectorized Cray library
function that converts a whole array of values at once.  I think part of the
problem is that you have to violate the natural layering of the XDR library,
since it provides a special-purpose call for an array of bytes but no such
call for an array of floats.  I believe that the function-call overhead of
one call per float is what's consuming most of the CPU time, rather than the
conversion to and from IEEE floating-point representation.

> 3) If there arent any, where do I start (ie can you point me to the
> low level routine I have to modify)? Hopefully, you could save me a
> day or so of poking around with a sentence or two.

The appended description of what was done on the Cray 3 may be of some help.

--
Russ Rew                                                   UCAR Unidata Program
address@hidden                                             P.O. Box 3000
http://www.unidata.ucar.edu/                             Boulder, CO 80307-3000


>From address@hidden Mon May  2 09:18:54 1994
Message-Id: <address@hidden>
Full-Name: Russ Rew
To: Chris Anderson <address@hidden>
Subject: Re: NetCDF & XDR on Cray 
In-Reply-To: Your message of "Mon, 02 May 1994 11:27:45 PDT."              
<address@hidden> 
Organization: UCAR Unidata Program
Date: Mon, 02 May 1994 15:18:54 -0600
From: Russ Rew <address@hidden>

> Organization: The National Energy Research Supercomputer Center (NERSC)
> Keywords: 199405021827.AA20955

Hi Chris,

> Hello Russ, I was at the the Dept. of Energy Computer Graphics forum last
> week.  I talked with some people from Los Alamos and Sandia laboratories
> regarding XDR performance on the Crays, and it seems that there is more
> than enough interest in our providing some effort to boost the performance
> of NetCDF on the Cray.  I thought that I should try to guage your interest
> before embarking on this journey.
> 
> From the minimal contacts I have had with the Cray people, they understand
> that the XDR routines aren't optimal, but don't seem too interested in
> doing anything about it.  On the other hand, with the Cray T3D coming along,
> the issue of going from Cray representation into IEEE may become more
> important to them (as the T3D uses Cray YMP & DEC Alpha Processors).  As both
> NERSC and NCAR will be getting a T3D perhaps we have a renewed opportunity
> for getting Cray to do something (though I wont hold my breath ;-)
> 
> In the meantime, Cray has other routines for doing IEEE format
> conversions.  We at NERSC are willing to try and make use of these (more
> optimized) routines in NetCDF, if you feel that the UCAR people will
> consider making their use part of the distribution (we won't need to
> distribute the routines, just make "ifdef'ed" calls to them for the
> CRAY). Looking forward to hearing from you.  --Chris

We would certainly consider making optimized versions of the XDR routines
for Cray platforms available as part of the distribution.  Several users
have asked us about such an optimization, and Cray Computer has corresponded
with us about their improvements to xdr_array() to get efficient conversion
for floating-point arrays on a Cray-3.  They claim that their changes for
the Cray-3 would not do any good on Cray Labs platforms from CRI, so we
might have a problem trying to support both sets of modifications.  I've
appended some email correspondence I had with people from Cray Computer on
this issue ...

__________________________________________________________________________
                      
Russ Rew                                              UCAR Unidata Program
address@hidden                                        P.O. Box 3000
(303)497-8645                                 Boulder, Colorado 80307-3000

To: address@hidden (Dave Resch)
    support-netcdf
Subject: Re: netcdf install on Cray3
Organization: UCAR Unidata Program
Date: Mon, 11 Apr 1994 15:13:35 -0600
From: Russ Rew <russ@buddy>

> Organization: Cray
> Keywords: 199404082139.AA08286

Hi Dave,

> The netcdf software has been installed on the Cray3 machine at NCAR
> (graywolf).  Following are a list of modifications that were needed to get
> the software to install and run correctly on the Cray3:
> 
> 1) Edit the CUSTOMIZE file in the netcdf-2.3.2 directory:
>        CC=/bin/cc
>        FC=/bin/f77
>        FFLAGS= 
>        prefix=/usr/local
>          #
>          # Specify the operating system 
>          OS=csos
>          #
>          # Ensure that the "native" xdr header files and libraries are used
>          #
>          CPP_XDR="/usr/include/rpc"
>          LD_XDR="-l /usr/lib/libnet.a -l /usr/lib/librpc.a"
> 
> 2) Create an operating system file in the netcdf-2.3.2/fortran directory
>    named:
>       csos.m4
>   
>    This is simply the unicos.m4 file with the following change:
>       < define(`M4__SYSTEM',CSOS)
>         ---
>         > define(`M4__SYSTEM', UNICOS)
>  
> 3) Edit the configure script to recognize the LD_XDR definition from above by
>    commenting out the line which redefines LD_XDR to be nothing:
>         < #  *)         LD_XDR=;;
>         ---
>         >   *)          LD_XDR=;;
> 
> 4) Edit the file  netcdf-2.3.2/libsrc/netcdf.h to correctly define FILL_BYTE 
>    for the CCC compilers:
>         < #define FILL_BYTE     ((signed char)-127) /* Largest Negative value 
> */
>         ---
>         > #define FILL_BYTE     ((char)-127)        /* Largest Negative value 
> */
> 
> 
> We (Cray Computer) would like to know if Unidata is willing to add the
> above support to the netcdf install kit so that netcdf correctly installs
> on a Cray3 machine?

Yes, the changes look small enough that I would have no problem adding them
to the source distribution.  I would like to be able to identify the
platform at run-time so I can integrate these changes into our
auto-configure package.  For that, I need to know what predefined constant
is available for specifying this platform.  For example, the UNICOS Cray C
compiler predefines the macro "_UNICOS", so I can test on it with statements
like

    #ifdef _UNICOS
        ...
    #endif

Is there a similar predefined macro for CSOS?

> Additionally, we have developed some very efficient, vectorized Cray <==>
> IEEE floating point conversion routines.  We will be modifying our
> xdr_array() implementaion to use these new conversion routines and would
> also like to modify some of the netcdf-2.3.2/libsrc modules to use them.
>   
> Basically, a single call would be made to convert an arbitrary number of 
> values rather than converting a single value at a time from within a loop 
> as is currently done.  The changes should be very minor.  Again, we would 
> like to know if Unidata is willing to add this support (via conditional 
> compilation directives) to the netcdf sources?

We have had several requests for such an optimization, since currently this
is the main bottleneck in using the netcdf library on Cray platforms.  I
would be willing to add this support, via conditional compilation
directives.  This would be especially useful if it worked for all Cray
platforms, not just the Cray 3.

__________________________________________________________________________
                      
Russ Rew                                              UCAR Unidata Program
address@hidden                                        P.O. Box 3000
(303)497-8645                                 Boulder, Colorado 80307-3000

To: address@hidden (Steve Gombosi)
Subject: Re: netcdf install on Cray3 
Organization: UCAR Unidata Program
Date: Mon, 11 Apr 1994 17:01:25 -0600
From: Russ Rew <russ@buddy>

Steve,

> >We have had several requests for such an optimization, since currently this
> >is the main bottleneck in using the netcdf library on Cray platforms.  I
> >would be willing to add this support, via conditional compilation
> >directives.  This would be especially useful if it worked for all Cray
> >platforms, not just the Cray 3.
> 
> The optimization in question is for array.c to call xdr_vector once rather
> than issue multiple calls to xdr_float or whatever. This produces a
> significant speedup because we are in the process of modifying xdr_vector
> to use vectorized conversion routines which we (Cray Computer) have
> written for the Cray-3.
> 
> The version of xdr_vector supplied by Cray Research is not optimized
> in this way. There would be no benefit to making this modification to
> array.c on CRI machines - in fact, it would produce a slight loss of
> performance on those systems (the results should still be correct, however).
> 
> The speedup in data conversion is quite substantial. For an XDR stream
> opened to memory (via xdrmem_create()), the conversion is asymptotically
> about 100 times faster than using individual calls to xdr_float(). For
> an XDR stream opened to a file (via xdrstdio_create()), which would be the 
> most common case for netcdf, the asymptotic speedup factor appears to be in
> the neighborhood of 200 times. These timings are based on actual runs
> on the NCAR Cray-3 - the simulator probably would have produced more
> consistent measurements, but the old code takes so long to run under the
> simulator that it's not possible to do large amounts of data. The following
> is a measurement of the performance of the conversion loop from array.c 
> contrasted with a single, equivalent call to the new xdr_vector routine:
> 
>               XDR opened with xdrstdio_create()
> 
>            Size       Clocks(old)       Clocks(new)       Speedup(old/new)
>               1             12962              9568           1.354724    
>               2              7104              5328           1.333333    
 ...
>          131072         432382624           2265124           190.8870    
>          262144         864768712           4518792           191.3717    
>          524288        1729798122           9026452           191.6366    
> 
> Optimized Integer conversions are not yet in place but should 
> be available sometime this week.
> 
> How large are the arrays which are typically written by array.c?

I don't know.  I imagine they are all over the map, but might be expected to
be somewhat larger on Crays than on workstations.  In any case, this looks
like a worthwhile improvement, even for small vectors.

I don't know much about the Cray 3 or about how it compares with CRI Crays.
Is there an on-line document I could read to learn more about it or about
Cray Computers?  For example, I'm curious if you're actually using a
different representation for floating point numbers, since you evidently
wrote a new XDR library.

--Russ
To: davis
Subject: [address@hidden (Steve Gombosi): Re: netcdf install on Cray3]
Date: Tue, 12 Apr 1994 08:28:36 -0600
From: Russ Rew <russ@buddy>

Glenn,

Just for your information, this corrects a misstatement I made yesterday
about the floating point representations being different on Cray 3's and
other Crays.

--Russ
------- Forwarded Message

Date: Mon, 11 Apr 94 17:36:19 MDT
From: address@hidden (Steve Gombosi)
To: address@hidden, address@hidden
Subject: Re: netcdf install on Cray3

>I don't know much about the Cray 3 or about how it compares with CRI Crays.
>Is there an on-line document I could read to learn more about it or about
>Cray Computers?  

There are several man pages on the Cray-3. If you'd like, I could email 
copies of them to you. We could probably arrange to get you a hardware 
manual, as well.

>For example, I'm curious if you're actually using a
>different representation for floating point numbers, since you evidently
>wrote a new XDR library.

No, the floating point representation is identical to the CRI machines.
We are using essentially the same library we received from CRI when the
company split up. While helping Dave in his efforts to get the netcdf
port working, I noticed that there was significant room for improvement
in the code and decided to see what some small changes would yield. 
Unfortunately, the entire design of the XDR library is oriented toward
small, scalar machines. It seems that no one at CRI bothered to make
any attempt at an efficient implementation on a large, vector architecture.

Steve

------- End of Forwarded Message


.