[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unidata Support: 951213: More netCDF-2.4-beta5 test results



Jeff,

I'm forwarding the note below from John Sheldon at GFDL, which contains more
results from testing Cray optimizations.  He's using "nctime.c", which is a
little stand-alone netCDF benchmarking program I wrote a couple of years ago
that is available from

    ftp://ftp.unidata.ucar.edu/pub/netcdf/nctime.c

I intend to see if I can reproduce Sheldon's results on shavano, but I
probably won't get to it until tomorrow or Friday.

I have a few questions:

 1.  In your current position, do you have any time to look at this (and the
     NCFILL/NCNOFILL results) and respond to Sheldon's questions?  If not,
     do you know of anyone else with enough Cray expertise to investigate or
     explain the results from Sheldon's tests?

 2.  It's possible that some of the results Sheldon is seeing are due to the
     way we integrated your optimizations into our release.  Is there still
     a copy of your Cray library around that just has your Cray
     optimizations to the previous netCDF 2.3.2 release, without any other
     changes we've made for 2.4?  If so, I'd like to link against that and
     run the tests on shavano with that version too.

 3.  Are the benchmarks done by nctime too artificial or variable to be
     useful?  It takes a four-dimensional slab of specified size and times
     writing it all out with ncvarput as well as reading back in all 16
     kinds of cross sections.  It does this for all six types of netCDF
     data.  Previously I've noted that unless a local file system is used,
     NFS caching may get in the way of consistent results.  Results also may
     be very dependent on sizes used and may vary from run to run for other
     reasons that are difficult to control.

Thanks for any light you can shed on this!

--Russ


- ------- Forwarded Message

>From: address@hidden (John Sheldon)
>Organization: GFDL
>Keywords: 199512121934.AA29804 netCDF CRAY

Hi again-

In my continuing timing tests on our C90, I noticed a potentially
serious problem with ver.2.4.  I used "nctime" with both the 2.3.2 and
2.4-beta5 libraries and got the following:

2.3.2 :
- - ----- float_var(12,18,37,73)
time for ncvarput 12x18x37x73   2026.276 msec
time for ncvarget 1x1x1x1          0.048 msec
time for ncvarget 12x1x1x1         5.874 msec
time for ncvarget 1x18x1x1         1.303 msec
time for ncvarget 1x1x37x1         1.579 msec
time for ncvarget 1x1x1x73         0.400 msec
time for ncvarget 12x18x1x1       14.185 msec
time for ncvarget 12x1x37x1       12.739 msec
time for ncvarget 12x1x1x73        9.585 msec
time for ncvarget 1x18x37x1       11.279 msec* <----
time for ncvarget 1x18x1x73        7.430 msec
time for ncvarget 1x1x37x73       12.867 msec*
time for ncvarget 12x18x37x1     102.484 msec
time for ncvarget 12x18x1x73      62.653 msec
time for ncvarget 12x1x37x73     115.803 msec
time for ncvarget 1x18x37x73     162.005 msec*
time for ncvarget 12x18x37x73   1939.247 msec


2.4-beta5 :
- - ----- float_var(12,18,37,73)
time for ncvarput 12x18x37x73     15.825 msec
time for ncvarget 1x1x1x1          2.729 msec
time for ncvarget 12x1x1x1        22.667 msec
time for ncvarget 1x18x1x1        32.672 msec
time for ncvarget 1x1x37x1        54.518 msec
time for ncvarget 1x1x1x73         2.177 msec
time for ncvarget 12x18x1x1      342.961 msec
time for ncvarget 12x1x37x1      701.648 msec
time for ncvarget 12x1x1x73       22.740 msec
time for ncvarget 1x18x37x1     1011.911 msec* <---- !!! x92 more ! PROBLEM!!
time for ncvarget 1x18x1x73       32.716 msec
time for ncvarget 1x1x37x73        2.257 msec* <---- x6 less
time for ncvarget 12x18x37x1   12542.287 msec
time for ncvarget 12x18x1x73     341.818 msec
time for ncvarget 12x1x37x73      22.605 msec
time for ncvarget 1x18x37x73       3.594 msec* <---- x50 less
time for ncvarget 12x18x37x73     38.172 msec


While many of the accesses posted much better times, the X-Z slab
accesses were 100 times slower!  If I wanted to step thru X-Z slabs of
the results from a 1-degree model run (180 y-points), it will take 3
minutes where it used to take 2 seconds!

Now, that's on the "read" end.  On the "write" end, I wrote my own
small test program (with NCNOFILL!) which showed that the times are
substantially _less_ using version 2.4 : user-CP sec=~1.2 vs 21 for
version 2.3.2, and wall clock time of 1:30 vs 13-26(!) mintues (I did a
couple of runs) for version 2.3.2.  I don't understand why writing
goes so much faster with version 2.4, but reading goes so much slower.
Any ideas?

Hope this helps, even if it's not necessarily welcome news...

John
address@hidden


- ------- End of Forwarded Message


------- End of Forwarded Message