[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #TIR-820282]: NetCDF-4 Parallel independent access with unlimited dimension (Fortran 90)



Reto,

> I've tried the following configuration
> - hdf5 1.8.11-snap16
> - netcdf-4.3.0-rc4
> - netcdf-fortran-4.2
> - openmpi-1.6.3
> - gcc/gfortran 4.6.3
> 
> Same issue. If I let all processes do the write, then it works fine. If I for 
> instance exclude process #0,1,2 or 3 from the writing, then the write hangs 
> (all metadata/open/close is collective, only the write is independent.). It 
> seems to me that somehow on my system all writes are collective by default 
> and thus the write operation is not executed as independent.
> 
> Do you have a configuration with openmpi on OSX somewhere around?

Yes, I had to deactivate my mpich configuration first, but now have openmpi 
1.6.4 on 
OSX 10.8.3.  However, when I try to build hdf5 1.8.11-pre1 with it, using

  CC=/opt/local/lib/openmpi/bin/mpicc ./configure
  make
  make check

Some tests fail in "make check", for example testing "ph5diff 
h5diff_basiccl.h5", that 
may be due to not having a POSIX-compliant parallel file system installed.  
Also I
jut noticed that the earlier test t_posix_compliant test for 
allwrite_allread_blocks
with POSIX IO failed, though it returned 0 so as not to stop the hdf5 testing.


Are you using a parallel file system?  Do you set the environment variable 
HDF5_PARAPREFIX to a directory in a parallel file system?  What file system are 
you 
using for your parallel I/O tests?

I'm afraid I don't know much about parallel I/O, and the netCDF parallel I/O 
expert
got lured away to a different job some time ago, so we may need some help or 
pointers
where to look to install a parallel file system on our OS X platform for this 
kind of
testing and debugging.

> I will start putting some debugging commands into the netcdf-fortran library 
> and see where the process really hangs and whether the collective/independent 
> write is executed correctly.

Thanks, that would be helpful ...

--Russ

> Reto
> 
> 
> On Apr 9, 2013, at 11:01 PM, Unidata netCDF Support wrote:
> 
> > Hi Reto,
> >
> > Sorry to have taken so long to respond to your question.
> >> I have been using NetCDF-4 Parallel I/O with the Fortran 90 interface for 
> >> some time with success. Thank you for this great tool!
> >>
> >> However, I now have an issue with independent access:
> >>
> >> - NetCDF F90 Parallel access (NetCDF-4, MPIIO)
> >> - 3 fixed and 1 unlimited dimension
> >> - alle processes open/close the file and write metadata
> >> - only a few processes write to the file (-> independent access)
> >> - the write hangs. It works fine if all processes take place.
> >>
> >> I've changed your example F90 parallel I/O file simple_xy_par_wr.f90 to 
> >> include a unlimited dimension and independent access of only a subset of 
> >> processes. Same issue. Even if I explicitly set the access type to 
> >> independent for the variable. Can you reproduce the issue on your side?
> >>
> >> The following system configuration on my side:
> >> - NetCDF 4.2.1.1 and F90 interface 4.2
> >> - hdf5 1.8.9
> >> - Openmpi 1.
> >> - OSX, gcc 4.6.3
> >
> > No, I haven't been able to reproduce the issue, but I can't exactly 
> > duplicate
> > your configuration easily, and there have been some updates and bug fixes 
> > that
> > may have made a difference.
> >
> > First I tried this configuration, which worked fine on your attached 
> > example:
> >
> > - NetCDF 4.3.0-rc4 and F90 interface 4.2
> > - hdf5 1.8.11 (release candidate from svn repository)
> > - mpich2-1.3.1
> > - Linux Fedora, mpicc, mpif90 wrapping gcc, gfortran 4.5.1
> >
> > So if you can build those versions, it should work for you.  I'm not sure 
> > whether
> > the fix is in netCDF-4.3.0 or in hdf5-1.8.11, but both have a fix for at 
> > least one
> > parallel I/O hanging process issue:
> >
> >  https://bugtracking.unidata.ucar.edu/browse/NCF-214  (fix in netCDF-4.3.0)
> >  https://bugtracking.unidata.ucar.edu/browse/NCF-240  (fix in HDF5-1.8.11)
> >
> > --Russ
> >
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                      http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: TIR-820282
> > Department: Support netCDF
> > Priority: High
> > Status: Closed
> >
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: TIR-820282
Department: Support netCDF
Priority: High
Status: Closed