Re: [netcdf-hdf] [netcdfgroup] NetCDF: HDF error, and now what?

"Hernan G. Arango" <arango@xxxxxxxxxxxxxxxxxx> writes:

> I have tried the TotalView debugger with the libraries and it
> is really a nightmare if you don't know the code intimately.  The
> problem is that there a lot of recursive calls which make
> debugging more difficult.  This is not a trivial library to
> debug.  I spend several days looking and I was not able to
> find the source of my parallel problem.

Well, I think this could frequently be said of other people's code. ;-)

The complexity in the netCDF library is mostly concerning metadata. Data
writes and reads are passed very quickly to the HDF5 library.

> Our parallel I/O in ROMS is broken with the new versions of
> the NetCDF library. It works only with NetCDF 4.1.1.
> Several changes were done after that and the parallel I/O no
> longer works with independent I/O access .  I need to
> go back to the parallel debugger when I get the chance. This
> is the third time that it happens so it is a little annoying.

Can you be more specific?

> Still, the performance of parallel I/O with the NetCDF4/HDF
> libraries is not that great... I cannot even match the
> performance of serial I/O. There is room for a lot of

Parallel I/O performance is a complex thing. Ultimately it is limited by
the speed of hardware and connections between your processors and disk
drives. Even in parallel machines, with tens of thousands of processors,
performance will quickly max out at less than 100 (in my experience)
because the I/O subsystem becomes saturated. 

The benefit of parallel I/O in this regime is not greater performance,
but code simplicity. Instead of moving all data to a single processor
for sequential write, each process can write its own data. It's no
faster, but there's a lot less code to write.

I can usually detect significant performance improvements for I/O for
less than 32 processors on every machine I've tested with a parallel I/O
file system. (And even for 2 or 4 processors without one.) 

Sequential I/O, meanwhile, benefits greatly from the multiple layers of
buffering between the processor and disk platter, so it's hard to beat.

> improvements.  I did noticed a lot of inefficient MPI
> communications during my debugging session.  In my opinion,
> the parallel stuff needs to be redesigned and written from
> scratch to see if we can improve the performance.

The two libraries have different involvement in parallel I/O. The netCDF
library does not contain any MPI code, all that is done at the HDF5
level.

> It is nice to know about the nc_set_log_level call to get
> detailed information.

Good! I also suggest you build with --enable-benchmarks and take a look
at the output of nc_test4/nc4perf.c, which tests parallel I/O with a
variety of settings on your machine. It may help you hone in on good
settings for your situation.

Thanks,

Ed
-- 
Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx