"David Stuebe" <dstuebe@xxxxxxxxxx> writes:

Hi NETCDF folks 
I work on an unstructured finite volume coastal ocean model, FVCOM, which is parallel (using MPICH2). The Read Write 
> is a major slow down for our large cases. On our cluster, we have one large 
> storage device, an emc raid array. The network is infini-band - the network 
is much faster than the raid array.
For our model we need to read large initial condition data sets, and 
> single frames of forcing data while running. We also need to write single 
> frames of data for output (frequently), and large restart files (less 
frequently).
I am considering two options for recoding the IO from the model. One 
> is based around the future F90 netcdf 4 parallel interface which would allow 
> a symmetric code- every processor does the same thing. The other option is to 
use netcdf 3, let the master processor read/write the data and distribute it to each node, -an asymmetric coding. 
> to each node, -an asymmetric coding.
> <br><br>What I need to know-&nbsp; are netcdf 4 parallel IO operations 
> blocking? <br><br>The problem - the order of cells and nodes in our data set 
> does not allow for a simple start, count read format. A data array might have 
> dimensions (time,layers,cells). As an example, in&nbsp; a 2 processor case 
> with 8 cells, proc1 has cells(1 2 5 7) while proc2 has cells (3 4 6 8) - 
> write operations would have to be in a do loop to write each cell 
individually from the processor that owns it.
For a model with 300,000 cells on 30 processors, this would be 10,000 
> calls to NF90_PUT_VAR on each processor. Even if the calls are non-blocking 
> this seems dangerous.<br><br>Any thoughts?<br><br>David<br><br><br>
Howdy David!

Are you using unlimited dimensions for this test, and writing?

There was a bug in netCDF-4 which caused metadata to be written every
time a record variable was expanded along the unlimited
dimension. This would cause a slowdown of parallel I/O performance,
because blocking would occur on every write operation, as the metadata
were updated.

This is now fixed on the netcdf-4 snapshot:

Other than this bug, I believe that netCDF-4 will yield the same
performance as the underlying HDF5 API, so the comments of the HDF5
programmers are very relevant.

But before you test again, get the netCDF-4 snapshot to make sure it's
not the netCDF-4 metadata bug which was causing your problems.



Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx

