[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #TIR-820282]: NetCDF-4 Parallel independent access with unlimited dimension (Fortran 90)



Reto,

> I've tried the following configuration
> - hdf5 1.8.11-snap16
> - netcdf-4.3.0-rc4
> - netcdf-fortran-4.2
> - openmpi-1.6.3
> - gcc/gfortran 4.6.3
> 
> Same issue. If I let all processes do the write, then it works fine. If I for 
> instance exclude process #0,1,2 or 3 from the writing, then the write hangs 
> (all metadata/open/close is collective, only the write is independent.). It 
> seems to me that somehow on my system all writes are collective by default 
> and thus the write operation is not executed as independent.
> 
> Do you have a configuration with openmpi on OSX somewhere around?

Yes, I had to deactivate my mpich configuration first, but now have openmpi 
1.6.4 on 
OSX 10.8.3.  However, when I try to build hdf5 1.8.11-pre1 with it, using

  CC=/opt/local/lib/openmpi/bin/mpicc ./configure
  make
  make check

Some tests fail in "make check", for example testing "ph5diff 
h5diff_basiccl.h5", that 
may be due to not having a POSIX-compliant parallel file system installed.  
Also I
jut noticed that the earlier test t_posix_compliant test for 
allwrite_allread_blocks
with POSIX IO failed, though it returned 0 so as not to stop the hdf5 testing.


Are you using a parallel file system?  Do you set the environment variable 
HDF5_PARAPREFIX to a directory in a parallel file system?  What file system are 
you 
using for your parallel I/O tests?

I'm afraid I don't know much about parallel I/O, and the netCDF parallel I/O 
expert
got lured away to a different job some time ago, so we may need some help or 
pointers
where to look to install a parallel file system on our OS X platform for this 
kind of
testing and debugging.

> I will start putting some debugging commands into the netcdf-fortran library 
> and see where the process really hangs and whether the collective/independent 
> write is executed correctly.

Thanks, that would be helpful ...

--Russ

> Reto
> 
> 
> On Apr 9, 2013, at 11:01 PM, Unidata netCDF Support wrote:
> 
> > Hi Reto,
> >
> > Sorry to have taken so long to respond to your question.
> >> I have been using NetCDF-4 Parallel I/O with the Fortran 90 interface for 
> >> some time with success. Thank you for this great tool!
> >>
> >> However, I now have an issue with independent access:
> >>
> >> - NetCDF F90 Parallel access (NetCDF-4, MPIIO)
> >> - 3 fixed and 1 unlimited dimension
> >> - alle processes open/close the file and write metadata
> >> - only a few processes write to the file (-> independent access)
> >> - the write hangs. It works fine if all processes take place.
> >>
> >> I've changed your example F90 parallel I/O file simple_xy_par_wr.f90 to 
> >> include a unlimited dimension and independent access of only a subset of 
> >> processes. Same issue. Even if I explicitly set the access type to 
> >> independent for the variable. Can you reproduce the issue on your side?
> >>
> >> The following system configuration on my side:
> >> - NetCDF 4.2.1.1 and F90 interface 4.2
> >> - hdf5 1.8.9
> >> - Openmpi 1.
> >> - OSX, gcc 4.6.3
> >
> > No, I haven't been able to reproduce the issue, but I can't exactly 
> > duplicate
> > your configuration easily, and there have been some updates and bug fixes 
> > that
> > may have made a difference.
> >
> > First I tried this configuration, which worked fine on your attached 
> > example:
> >
> > - NetCDF 4.3.0-rc4 and F90 interface 4.2
> > - hdf5 1.8.11 (release candidate from svn repository)
> > - mpich2-1.3.1
> > - Linux Fedora, mpicc, mpif90 wrapping gcc, gfortran 4.5.1
> >
> > So if you can build those versions, it should work for you.  I'm not sure 
> > whether
> > the fix is in netCDF-4.3.0 or in hdf5-1.8.11, but both have a fix for at 
> > least one
> > parallel I/O hanging process issue:
> >
> >  https://bugtracking.unidata.ucar.edu/browse/NCF-214  (fix in netCDF-4.3.0)
> >  https://bugtracking.unidata.ucar.edu/browse/NCF-240  (fix in HDF5-1.8.11)
> >
> > --Russ
> >
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                      http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: TIR-820282
> > Department: Support netCDF
> > Priority: High
> > Status: Closed
> >
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: TIR-820282
Department: Support netCDF
Priority: High
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.