Re: netcdf4 parallel IO

NOTE: The netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.

To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: Re: netcdf4 parallel IO
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
Date: Mon, 30 Apr 2007 17:51:59 -0600

"David Stuebe" <dstuebe@xxxxxxxxxx> writes:

> <br><br>Hi NETCDF folks<br><br>I work on an unstructured finite volume 
> coastal ocean model, FVCOM, which is parallel (using MPICH2). The Read Write 
> is a major slow down for our large cases. On our cluster, we have one large 
> storage device, an emc raid array. The network is infini-band - the network 
> is much faster than the raid array.
> <br><br>For our model we need to read large initial condition data sets, and 
> single frames of forcing data while running. We also need to write single 
> frames of data for output (frequently), and large restart files (less 
> frequently).
> <br><br>I am considering two options for recoding the IO from the model. One 
> is based around the future F90 netcdf 4 parallel interface which would allow 
> a symmetric code- every processor does the same thing. The other option is to 
> use netcdf 3, let the master processor read/write the data and distribute it 
> to each node, -an asymmetric coding.
> <br><br>What I need to know-&nbsp; are netcdf 4 parallel IO operations 
> blocking? <br><br>The problem - the order of cells and nodes in our data set 
> does not allow for a simple start, count read format. A data array might have 
> dimensions (time,layers,cells). As an example, in&nbsp; a 2 processor case 
> with 8 cells, proc1 has cells(1 2 5 7) while proc2 has cells (3 4 6 8) - 
> write operations would have to be in a do loop to write each cell 
> individually from the processor that owns it.
> <br><br>For a model with 300,000 cells on 30 processors, this would be 10,000 
> calls to NF90_PUT_VAR on each processor. Even if the calls are non-blocking 
> this seems dangerous.<br><br>Any thoughts?<br><br>David<br><br><br>
> <br><br>

Howdy David!

Are you using unlimited dimensions for this test, and writing?

There was a bug in netCDF-4 which caused metadata to be written every
time a record variable was expanded along the unlimited
dimension. This would cause a slowdown of parallel I/O performance,
because blocking would occur on every write operation, as the metadata
were updated.

This is now fixed on the netcdf-4 snapshot:
http://www.unidata.ucar.edu/software/netcdf/builds/snapshot/index_4.html

Other than this bug, I believe that netCDF-4 will yield the same
performance as the underlying HDF5 API, so the comments of the HDF5
programmers are very relevant.

But before you test again, get the netCDF-4 snapshot to make sure it's
not the netCDF-4 metadata bug which was causing your problems.

Thanks!

Ed


-- 
Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx

==============================================================================
To unsubscribe netcdf-hdf, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
==============================================================================

References:
- netcdf4 parallel IO
  - From: David Stuebe

2007 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-hdf archives: