[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #XZT-324632]: installing threadsafe NetCDF 4.2 on IBM/AIX



Andrea,

> thanks a lot for your hints,
> I managed to correctly build the libraries
> by restoring the shared-lib option in the C library
> (also using -lnetcdf into LDFLAGS although
> my guess is that this applies only to "make check"
> and not to external code that I could not link properly anyway)
> 
> Unfortunately, even compiling with thread-safe
> IBM compilers and using this last NetCDF release, I have coredump
> in a Fortran code that is compiled with OpenMP enabled.
> This happens RANDOMLY in a routine that has no open threads
> i.e. 64 NetCDF files are opened sequentially out
> of an OpenMP parallel region.
> But when I run the code serially (OMP_NUM_THREADS=1)
> the problem disappears.
> The traceback is as follows:
> 
> Segmentation fault in leftmost at 0x9000000003470b8 ($t1)
> 0x9000000003470b8 (leftmost+0x8) e8c90008          ld   r6,0x8(r9)
> (dbx) where
> leftmost(??, ??) at 0x9000000003470b8
> malloc_y(0x1f, 0x0, 0xfffffffffffb860, 0xfffffffffffb87d, 0x0, 0x1e,
> 0x10000001, 0x0) at 0x900000000349280
> malloc_common_79_63(??) at 0x9000000003461a0
> nf_open(0xfffffffffffb860, 0xfffffffffff8a4c, 0xfffffffffff88a4,
> 0x1e0000001e) at 0x10011b8d0
> __netcdf_NMOD_nf90_open(0xfffffffffffb860, 0xfffffffffff8a4c,
> 0xfffffffffff88a4, 0x0, 0x0, 0x1e) at 0x100070a2c
> io_obs_(??, ??, ??, ??, ??, ??, ??, ??), line 242 in "io_obs.F90"
> readobs103_(), line 158 in "readobs103.F90"
> varjob_(), line 113 in "varjob.F90"
> master(), line 32 in "master.F90"
> 
> and "line 242 of io_obs.F90 looks simply
> CALL CHECK( NF90_OPEN(CFILE, NF90_NOWRITE, NCID) )
> 
> I would really appreciate if you have any workaround
> in the NetCDF code or any other suggestion,

If all you're doing is reading the files, you should be able to read one or 
more 
files concurrently through the netCDF API's, so I don't understand what's 
causing
the error.  You should not need to use the NF90_SHARE flag in the NF90_OPEN call
that's described here:

  http://www.unidata.ucar.edu/netcdf/docs/netcdf-f90/NF90_005fOPEN.html

but you could certainly try that to see if it helps, using something like

  CALL CHECK( NF90_OPEN(CFILE, or(NF90_NOWRITE,NF90_SHARE), NCID) )

If you need to write to a netCDF file from more than one process concurrently,
you would have to use one of the parallel netCDF libraries, either pnetCDF for
classic format files or netCDF-4 with HDF5 built for parallel I/O.

--Russ

> >> Hi Andrea,
> >>
> >> > I am trying to install a static version of NetCDF 4.2
> >> > (I am interested in the Fortran APIs) to be threadsafe and in 64b
> >> > on a IBM Power6 machine.
> >
> > Incidentally, the netCDF library is *not* threadsafe. The C library
> > internally maintains a list of information about open netCDF files in
> > a global data structure that gets modified when files are opened or
> > closed.  Fixing that problem is currently an open issue:
> >
> >   https://www.unidata.ucar.edu/jira/browse/NCF-115
> >
> > but the last comment indicates some recent progress has been made.  You
> > can register to be notified when that issue gets resolved by selecting
> > the "Watch" link on that jira page ...
> >
> > --Russ
> >
> >> > To do that, I first install the C library (v4.2.1.1) with the commands
> >> >
> >> > + export OBJECT_MODE=64
> >> > + export CC=xlc_r
> >> > + export FC=xlf90_r
> >> > + export F77=xlf_r
> >> > + export FCFLAGS=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> -qcache=auto
> >> > -qarch=pwr6 -qtune=pwr6
> >> > + export FFLAGS=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> -qcache=auto
> >> > -qarch=pwr6 -qtune=pwr6
> >> > + export CFLAGS=-q64 -qmaxmem=-1 -qarch=pwr6 -qtune=pwr6
> >> > + export LDFLAGS=-q64 -b64
> >> > + export ARFLAGS=-X 64 -cru
> >> > + export AR_FLAGS=-X 64 -cru
> >> > + ./configure --prefix=/users/home/ans012/local/netcdf-4.2-c
> >> > --disable-netcdf-4 --disable-doxygen --disable-shared
> >> >
> >> > The installation goes well,
> >> > then I move to the Fortran API package (netcdf-fortran-4.2)
> >> > that I install accordingly:
> >> >
> >> > + NC=/users/home/ans012/local/netcdf-4.2-c
> >> > + export OBJECT_MODE=64
> >> > + export CC=xlc_r
> >> > + export FC=xlf90_r
> >> > + export F77=xlf_r
> >> > + export FCFLAGS=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> -qcache=auto
> >> > -qarch=pwr6 -qtune=pwr6
> >> > + export FFLAGS=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> -qcache=auto
> >> > -qarch=pwr6 -qtune=pwr6
> >> > + export F90FLAGS_f90=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> > -qcache=auto -qarch=pwr6 -qtune=pwr6
> >> > + export FFLAGS_f90=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> > -qcache=auto -qarch=pwr6 -qtune=pwr6
> >> > + export CFLAGS=-q64 -qmaxmem=-1 -qarch=pwr6 -qtune=pwr6
> >> > + export LDFLAGS=-q64 -b64 -L/users/home/ans012/local/netcdf-4.2-c/lib
> >> > + export CPPFLAGS=-I/users/home/ans012/local/netcdf-4.2-c/include
> >> > + export ARFLAGS=-X 64 -cru
> >> > + export AR_FLAGS=-X 64 -cru
> >> > + ./configure --prefix=/users/home/ans012/local/netcdf-4.2-fortran
> >> > --disable-sharedhttp://www.unidata.ucar.edu/software/netcdf/docs/netcdf-fortran-install.html
> >>
> >> Since you disabled shared libraries for the C APIs, building Fortran
> >> libraries
> >> is somewhat more complicated, as descirbed in the second part of these
> >> instructions:
> >>
> >> http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-fortran-install.html
> >>
> >> In particular, I think you need to set LD_LIBRARY_PATH before invoking
> >> the
> >> configure script, and LDFLAGS will have to contain "-lnetcdf" and
> >> possibly
> >> other libraries, as shown in the example.
> >>
> >> Please let us know if this doesn't work.
> >>
> >> --Russ
> >>
> >> > Now the compilation goes well but the make check
> >> > fails:
> >> >
> >> > [...]
> >> > ld: 0711-317 ERROR: Undefined symbol: .nf_get_var1_int1_
> >> > ld: 0711-317 ERROR: Undefined symbol: .nf_get_var1_int2_
> >> > ld: 0711-317 ERROR: Undefined symbol: .nf_get_var1_int_
> >> > ld: 0711-317 ERROR: Undefined symbol: .nf_get_var1_real_
> >> > [...]
> >> >
> >> > Namely all the Fortran 77 API are "undefined symbol".
> >> > If I reinstall both C and Fortran libraries
> >> > without "-qextname" option for xlf/xlf90
> >> > (no trailing underscore for Fortran routines)
> >> > then all the C APIs (nc_get_var... etc.)
> >> > become "undefined symbol"
> >> >
> >> > In any case I can't succeed in linking
> >> > a Fortran program with the new libraries,
> >> >
> >> > do you have any suggestion/recommendation?
> >> >
> >> > Thanks in advance,
> >> > Regards
> >> >
> >> > Andrea Storto
> >> >
> >> >
> >> >
> >> Russ Rew                                         UCAR Unidata Program
> >> address@hidden                      http://www.unidata.ucar.edu
> >>
> >>
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                      http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: XZT-324632
> > Department: Support netCDF
> > Priority: Normal
> > Status: Closed
> >
> >
> 
> 
> ====================
> Andrea Storto, Ph.D.
> Dept. of Numerical Applications and Scenarios (ANS)
> Euro-Mediterranean Centre for Climate Change (CMCC) - www.cmcc.it
> viale Aldo Moro, 44, 7th Floor - 40127 BOLOGNA - Italy
> Phone: +39 (0)51 3782605 (int. 205) Mobile: +39 339 8176646
> Fax: +39 (0)51 3782655  Email: address@hidden
> ====================
> 
> 
Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: XZT-324632
Department: Support netCDF
Priority: Normal
Status: Closed