Re: Possibly of interest: parallel netcdf study


Thanks, this was interesting.  I think you want to change the table
heading in Table 1 from "NCAR IBM P690" to "NCSA IBM P690".

Thanks for correction. We will correct them.

I wonder if the results that show less wall clock time for 6 time
steps than for 4 time steps and similarly for 10 time steps less than
for 8 time steps with pnetCDF on the NCSA P690 might be an indication
of a discretization error in the timing.  Or maybe something else was
consuming enough of the machine that the results are unreliable.

I am not sure whether the discretization error in the timing is the reason. It is possible that the machine is busy during some runs. The reason we show this figure is for demonstration that
Parallel NetCDF is worse than Sequential NetCDF with small writes.

But things like parallel file system ,type of platforms, number of processors, the file layout of the model output as well as
MPI-IO and GPFS will also affect the performance.

I'm also curious why the pnetCDF appears to be so much slower than
serial netCDF for small writes.  Do you know what the nature of the
MPI-IO overhead is that could explain what appears to be a 10:1
slowdown for using pnetCDF with 4 time steps on the NCSA P690?  I
could understand maybe a 2:1 slowdown, but 10:1 seems surprisingly
large ...

Thanks for pointing out this. As a matter of fact, we may add more contents to explain this.

I can think the following factors that may be possibly  affect the performance:

MPI-IO library, parallel NetCDF implementation, parallel parallel file system ,type of platforms, number of processors, the file layout of the model, the domain decomposition of the model. We will write another report solely for the performance of ROMS with Parallel NetCDF. In that report we may talk more about these factors.

One important reason I can think of :
As the paper mentioned, there are about 20 1-element netcdf variables inside ROMS. All these variables are written in independent IO mode. There are no corresponding collective IO Parallel NetCDF functions. One strength for Parallel NetCDF is the collective IO with good "set file view". So through independent IO to write one element into the NetCDF file is not using any optimization of Parallel NetCDF. That will, I think, tremendously degrade the performance.

We may do another study to do further investigate whether that will improve the performance when we stop writing those variables into NetCDF.