[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

951229: NETCDF Performance



Rob,

>Date: Fri, 29 Dec 95 14:27:33 EST 
>From: address@hidden (R.Thurling)
>Organization: Aust. Bureau of Meteorology
>Subject: NETCDF Performance 
>Keywords: 199512060413.AA25138

In the above message you wrote:

> Just after 11th December, I sent you some information regarding our
> NETCDF install in response to a performance issue brought to your     
> attention by A. Sulaiman.
>  
> Did you receive and do you wish additional info.

Thanks, I received the information.  I must apologize for taking so long
to respond.

The information you gave me indicates that the installation went OK.

I suspect that the I/O inefficiency is due to a mismatch between the
pattern of your data access and the default setup of the UNICOS FFIO
buffers.  I suspect that you can greatly improve the I/O efficiency by
appropriately setting the NETCDF_FFIOSPEC environment variable.  I'm
enclosing an explanation of this topic from the person who created the
performance-enhancing module for UNICOS.

Please let me know if this helps.

Regards,
Steve Emmerson   <address@hidden>

------- Enclosed Message

Date:    Wed, 13 Dec 1995 12:06:28 -0700
From:    address@hidden (Jeffery A. Kuehn)
To:      address@hidden (Russ Rew)
Subject: Re: Unidata Support: 951213: More netCDF-2.4-beta5 test results

> Jeff,
> 
> I'm forwarding the note below from John Sheldon at GFDL, which contains more
> results from testing Cray optimizations.  He's using "nctime.c", which is a
> little stand-alone netCDF benchmarking program I wrote a couple of years ago
> that is available from
> 
>     ftp://ftp.unidata.ucar.edu/pub/netcdf/nctime.c
> 
> I intend to see if I can reproduce Sheldon's results on shavano, but I
> probably won't get to it until tomorrow or Friday.
> 
> I have a few questions:
> 
>  1.  In your current position, do you have any time to look at this (and the
>      NCFILL/NCNOFILL results) and respond to Sheldon's questions?  If not,
>      do you know of anyone else with enough Cray expertise to investigate or
>      explain the results from Sheldon's tests?

I'm willing to help.  Have him contact me and CC you on the email with future
questions.

>  2.  It's possible that some of the results Sheldon is seeing are due to the
>      way we integrated your optimizations into our release.  Is there still
>      a copy of your Cray library around that just has your Cray
>      optimizations to the previous netCDF 2.3.2 release, without any other
>      changes we've made for 2.4?  If so, I'd like to link against that and
>      run the tests on shavano with that version too.

I still have a copy of the old source tree before and after my changes.  If
after reading this, you'd like a copy, call me at x1311.

>  3.  Are the benchmarks done by nctime too artificial or variable to be
>      useful?  It takes a four-dimensional slab of specified size and times
>      writing it all out with ncvarput as well as reading back in all 16
>      kinds of cross sections.  It does this for all six types of netCDF
>      data.  Previously I've noted that unless a local file system is used,
>      NFS caching may get in the way of consistent results.  Results also may
>      be very dependent on sizes used and may vary from run to run for other
>      reasons that are difficult to control.

They may be.  The default buffering scheme used in the cray optimizations
depends on async I/O and a small number of buffers.  While this performs
pretty well in most cases, to get really good performance, one should tune
the buffering scheme and the cache configuration to suit the application.

So long as the pre-fill option still does all it's I/O using either the
putbytes() or putlong() routines for the XDR in question, all the I/O should
be passed through the buffers.  It would probably be a bad thing for the
prefill to not use the putbytes() or putlong() routines, given the buffering
which the library is currently performing.

The default buffering scheme favors sequential writes, but the cache is
very configurable.  He might try some of the following:

        setenv NETCDF_FFIOSPEC bufa:336:2

        (2 336-block asynchronous buffers... ie double buffering)
        (this is the default configuration and works well for 
        sequential accesses)


        setenv NETCDF_FFIOSPEC cache:256:8:2

        (8 256-block pages with a read-ahead factor of 2 blocks)
        (larger random accesses)


        setenv NETCDF_FFIOSPEC cachea:256:8:2

        (8 256-block pages with a read-ahead factor of 2 blocks, asynch)
        (larger random accesses)

        setenv NETCDF_FFIOSPEC cachea:8:256:0

        (256 8-block pages without read-ahead, asynch)
        (many smaller pages w/o read-ahead for more random accesses
        as typified by netcdf slicing arrays)


        setenv NETCDF_FFIOSPEC cache:8:256:0,cachea.sds:1024:4:1

        (hold onto your hat:  this is a two layer cache.  the first
        (synchronous) layer is composed of 256 8-block pages in memory,
        the second (asynchronous) layer is composed of 4 1024-block pages
        on the ssd.  this scheme works well when accesses precess
        through the file in random waves roughly 2x1024-blocks wide.)


All of the options/configurations supported in CRI's FFIO library are
usable through netcdf.  I'd also recommend that he look at CRI's I/O
optimization guide for info on using FFIO to it's fullest.  This is 
also compatible with CRI's EIE I/O library.  When you're ready to do
documentation for the next netcdf, we should put together some pointers
to the cray optimization guides to insert in your docs.

- --jeff