[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #TIR-820282]: NetCDF-4 Parallel independent access with unlimited dimension (Fortran 90)



Reto,

> Yes, the POSIX parallel I/O tests fail on OSX with OpenMPI, but that is fine. 
> OSX and OpenMPI uses MPIIO. So to my understanding the parallel tests are ok 
> if either POSIX or MPIIO work and the other one fails.
> 
> I am actually not using a parallel file system on OSX. I use the regular file 
> system (basic OSX installation) and I think that the parallel I/O has to work 
> in collective and independent mode even when using a regular file system.

I'm curious how you installed parallel HDF5, because my "make check" fails 
before finishing 
the tests.  Did you build HDF5 without --enable-parallel, or without using 
CC=mpicc?  Or did
you build it with parallel I/O, but run "make install" even though "make check" 
failed as a
result of not having a parallel file system?

--Russ

> I will test the same installation on Linux and then start debugging on OSX, 
> and maybe we find out something.
> 
> Btw. the netcdf-fortran 4.4 beta failed to compile alltogether on OSX, so I'm 
> still using netcdf-fortran 4.2.
> 
> Have a great weekend,
> 
> Reto
> 
> 
> On Apr 12, 2013, at 5:59 PM, Unidata netCDF Support wrote:
> 
> > Reto,
> >
> >> I've tried the following configuration
> >> - hdf5 1.8.11-snap16
> >> - netcdf-4.3.0-rc4
> >> - netcdf-fortran-4.2
> >> - openmpi-1.6.3
> >> - gcc/gfortran 4.6.3
> >>
> >> Same issue. If I let all processes do the write, then it works fine. If I 
> >> for instance exclude process #0,1,2 or 3 from the writing, then the write 
> >> hangs (all metadata/open/close is collective, only the write is 
> >> independent.). It seems to me that somehow on my system all writes are 
> >> collective by default and thus the write operation is not executed as 
> >> independent.
> >>
> >> Do you have a configuration with openmpi on OSX somewhere around?
> >
> > Yes, I had to deactivate my mpich configuration first, but now have openmpi 
> > 1.6.4 on
> > OSX 10.8.3.  However, when I try to build hdf5 1.8.11-pre1 with it, using
> >
> >  CC=/opt/local/lib/openmpi/bin/mpicc ./configure
> >  make
> >  make check
> >
> > Some tests fail in "make check", for example testing "ph5diff 
> > h5diff_basiccl.h5", that
> > may be due to not having a POSIX-compliant parallel file system installed.  
> > Also I
> > jut noticed that the earlier test t_posix_compliant test for 
> > allwrite_allread_blocks
> > with POSIX IO failed, though it returned 0 so as not to stop the hdf5 
> > testing.
> >
> >
> > Are you using a parallel file system?  Do you set the environment variable
> > HDF5_PARAPREFIX to a directory in a parallel file system?  What file system 
> > are you
> > using for your parallel I/O tests?
> >
> > I'm afraid I don't know much about parallel I/O, and the netCDF parallel 
> > I/O expert
> > got lured away to a different job some time ago, so we may need some help 
> > or pointers
> > where to look to install a parallel file system on our OS X platform for 
> > this kind of
> > testing and debugging.
> >
> >> I will start putting some debugging commands into the netcdf-fortran 
> >> library and see where the process really hangs and whether the 
> >> collective/independent write is executed correctly.
> >
> > Thanks, that would be helpful ...
> >
> > --Russ
> >
> >> Reto
> >>
> >>
> >> On Apr 9, 2013, at 11:01 PM, Unidata netCDF Support wrote:
> >>
> >>> Hi Reto,
> >>>
> >>> Sorry to have taken so long to respond to your question.
> >>>> I have been using NetCDF-4 Parallel I/O with the Fortran 90 interface 
> >>>> for some time with success. Thank you for this great tool!
> >>>>
> >>>> However, I now have an issue with independent access:
> >>>>
> >>>> - NetCDF F90 Parallel access (NetCDF-4, MPIIO)
> >>>> - 3 fixed and 1 unlimited dimension
> >>>> - alle processes open/close the file and write metadata
> >>>> - only a few processes write to the file (-> independent access)
> >>>> - the write hangs. It works fine if all processes take place.
> >>>>
> >>>> I've changed your example F90 parallel I/O file simple_xy_par_wr.f90 to 
> >>>> include a unlimited dimension and independent access of only a subset of 
> >>>> processes. Same issue. Even if I explicitly set the access type to 
> >>>> independent for the variable. Can you reproduce the issue on your side?
> >>>>
> >>>> The following system configuration on my side:
> >>>> - NetCDF 4.2.1.1 and F90 interface 4.2
> >>>> - hdf5 1.8.9
> >>>> - Openmpi 1.
> >>>> - OSX, gcc 4.6.3
> >>>
> >>> No, I haven't been able to reproduce the issue, but I can't exactly 
> >>> duplicate
> >>> your configuration easily, and there have been some updates and bug fixes 
> >>> that
> >>> may have made a difference.
> >>>
> >>> First I tried this configuration, which worked fine on your attached 
> >>> example:
> >>>
> >>> - NetCDF 4.3.0-rc4 and F90 interface 4.2
> >>> - hdf5 1.8.11 (release candidate from svn repository)
> >>> - mpich2-1.3.1
> >>> - Linux Fedora, mpicc, mpif90 wrapping gcc, gfortran 4.5.1
> >>>
> >>> So if you can build those versions, it should work for you.  I'm not sure 
> >>> whether
> >>> the fix is in netCDF-4.3.0 or in hdf5-1.8.11, but both have a fix for at 
> >>> least one
> >>> parallel I/O hanging process issue:
> >>>
> >>> https://bugtracking.unidata.ucar.edu/browse/NCF-214  (fix in netCDF-4.3.0)
> >>> https://bugtracking.unidata.ucar.edu/browse/NCF-240  (fix in HDF5-1.8.11)
> >>>
> >>> --Russ
> >>>
> >>> Russ Rew                                         UCAR Unidata Program
> >>> address@hidden                      http://www.unidata.ucar.edu
> >>>
> >>>
> >>>
> >>> Ticket Details
> >>> ===================
> >>> Ticket ID: TIR-820282
> >>> Department: Support netCDF
> >>> Priority: High
> >>> Status: Closed
> >>>
> >>
> >>
> >
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                      http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: TIR-820282
> > Department: Support netCDF
> > Priority: High
> > Status: Closed
> >
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: TIR-820282
Department: Support netCDF
Priority: High
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.