[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #XZT-324632]: installing threadsafe NetCDF 4.2 on IBM/AIX



Hi Andrea,

> after some research I found that the problem
> was not due to some library or compilation related
> setting, but to the AIX environmental variable
> 
> export LDR_CNTRL=TEXTPSIZE=64K@STACKPSIZE=64K@DATAPSIZE=64K
> 
> which handles the memory space at runtime.
> 
> By taking out DATAPSIZE=64K, the program can correctly
> open the NetCDF files also in a multi-thread environment.
> I remind you that the routine generating the Segmentation fault
> had no OMP parallel regions and that the problem did not
> occur when the job was launched as serial.
> 
> I can't figure out the reason, but hope you can and this might help,
> this occurred for no matter which NetCDF version (from 3.6 to 4.2),

Thanks very much for reporting back on a workaround for the problem!
We would never have found this, as we don't have access to an up-to-date
AIX platform for testing.  Even if we did, we probably would not have
set LDR_CNTRL to a non-default value ...

--Russ

> > Andrea,
> >
> >> thanks a lot for your hints,
> >> I managed to correctly build the libraries
> >> by restoring the shared-lib option in the C library
> >> (also using -lnetcdf into LDFLAGS although
> >> my guess is that this applies only to "make check"
> >> and not to external code that I could not link properly anyway)
> >>
> >> Unfortunately, even compiling with thread-safe
> >> IBM compilers and using this last NetCDF release, I have coredump
> >> in a Fortran code that is compiled with OpenMP enabled.
> >> This happens RANDOMLY in a routine that has no open threads
> >> i.e. 64 NetCDF files are opened sequentially out
> >> of an OpenMP parallel region.
> >> But when I run the code serially (OMP_NUM_THREADS=1)
> >> the problem disappears.
> >> The traceback is as follows:
> >>
> >> Segmentation fault in leftmost at 0x9000000003470b8 ($t1)
> >> 0x9000000003470b8 (leftmost+0x8) e8c90008          ld   r6,0x8(r9)
> >> (dbx) where
> >> leftmost(??, ??) at 0x9000000003470b8
> >> malloc_y(0x1f, 0x0, 0xfffffffffffb860, 0xfffffffffffb87d, 0x0, 0x1e,
> >> 0x10000001, 0x0) at 0x900000000349280
> >> malloc_common_79_63(??) at 0x9000000003461a0
> >> nf_open(0xfffffffffffb860, 0xfffffffffff8a4c, 0xfffffffffff88a4,
> >> 0x1e0000001e) at 0x10011b8d0
> >> __netcdf_NMOD_nf90_open(0xfffffffffffb860, 0xfffffffffff8a4c,
> >> 0xfffffffffff88a4, 0x0, 0x0, 0x1e) at 0x100070a2c
> >> io_obs_(??, ??, ??, ??, ??, ??, ??, ??), line 242 in "io_obs.F90"
> >> readobs103_(), line 158 in "readobs103.F90"
> >> varjob_(), line 113 in "varjob.F90"
> >> master(), line 32 in "master.F90"
> >>
> >> and "line 242 of io_obs.F90 looks simply
> >> CALL CHECK( NF90_OPEN(CFILE, NF90_NOWRITE, NCID) )
> >>
> >> I would really appreciate if you have any workaround
> >> in the NetCDF code or any other suggestion,
> >
> > If all you're doing is reading the files, you should be able to read one
> > or more
> > files concurrently through the netCDF API's, so I don't understand what's
> > causing
> > the error.  You should not need to use the NF90_SHARE flag in the
> > NF90_OPEN call
> > that's described here:
> >
> >   http://www.unidata.ucar.edu/netcdf/docs/netcdf-f90/NF90_005fOPEN.html
> >
> > but you could certainly try that to see if it helps, using something like
> >
> >   CALL CHECK( NF90_OPEN(CFILE, or(NF90_NOWRITE,NF90_SHARE), NCID) )
> >
> > If you need to write to a netCDF file from more than one process
> > concurrently,
> > you would have to use one of the parallel netCDF libraries, either pnetCDF
> > for
> > classic format files or netCDF-4 with HDF5 built for parallel I/O.
> >
> > --Russ
> >
> >> >> Hi Andrea,
> >> >>
> >> >> > I am trying to install a static version of NetCDF 4.2
> >> >> > (I am interested in the Fortran APIs) to be threadsafe and in 64b
> >> >> > on a IBM Power6 machine.
> >> >
> >> > Incidentally, the netCDF library is *not* threadsafe. The C library
> >> > internally maintains a list of information about open netCDF files in
> >> > a global data structure that gets modified when files are opened or
> >> > closed.  Fixing that problem is currently an open issue:
> >> >
> >> >   https://www.unidata.ucar.edu/jira/browse/NCF-115
> >> >
> >> > but the last comment indicates some recent progress has been made.
> >> You
> >> > can register to be notified when that issue gets resolved by selecting
> >> > the "Watch" link on that jira page ...
> >> >
> >> > --Russ
> >> >
> >> >> > To do that, I first install the C library (v4.2.1.1) with the
> >> commands
> >> >> >
> >> >> > + export OBJECT_MODE=64
> >> >> > + export CC=xlc_r
> >> >> > + export FC=xlf90_r
> >> >> > + export F77=xlf_r
> >> >> > + export FCFLAGS=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> >> -qcache=auto
> >> >> > -qarch=pwr6 -qtune=pwr6
> >> >> > + export FFLAGS=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> >> -qcache=auto
> >> >> > -qarch=pwr6 -qtune=pwr6
> >> >> > + export CFLAGS=-q64 -qmaxmem=-1 -qarch=pwr6 -qtune=pwr6
> >> >> > + export LDFLAGS=-q64 -b64
> >> >> > + export ARFLAGS=-X 64 -cru
> >> >> > + export AR_FLAGS=-X 64 -cru
> >> >> > + ./configure --prefix=/users/home/ans012/local/netcdf-4.2-c
> >> >> > --disable-netcdf-4 --disable-doxygen --disable-shared
> >> >> >
> >> >> > The installation goes well,
> >> >> > then I move to the Fortran API package (netcdf-fortran-4.2)
> >> >> > that I install accordingly:
> >> >> >
> >> >> > + NC=/users/home/ans012/local/netcdf-4.2-c
> >> >> > + export OBJECT_MODE=64
> >> >> > + export CC=xlc_r
> >> >> > + export FC=xlf90_r
> >> >> > + export F77=xlf_r
> >> >> > + export FCFLAGS=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> >> -qcache=auto
> >> >> > -qarch=pwr6 -qtune=pwr6
> >> >> > + export FFLAGS=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> >> -qcache=auto
> >> >> > -qarch=pwr6 -qtune=pwr6
> >> >> > + export F90FLAGS_f90=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> >> > -qcache=auto -qarch=pwr6 -qtune=pwr6
> >> >> > + export FFLAGS_f90=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> >> > -qcache=auto -qarch=pwr6 -qtune=pwr6
> >> >> > + export CFLAGS=-q64 -qmaxmem=-1 -qarch=pwr6 -qtune=pwr6
> >> >> > + export LDFLAGS=-q64 -b64
> >> -L/users/home/ans012/local/netcdf-4.2-c/lib
> >> >> > + export CPPFLAGS=-I/users/home/ans012/local/netcdf-4.2-c/include
> >> >> > + export ARFLAGS=-X 64 -cru
> >> >> > + export AR_FLAGS=-X 64 -cru
> >> >> > + ./configure --prefix=/users/home/ans012/local/netcdf-4.2-fortran
> >> >> > --disable-sharedhttp://www.unidata.ucar.edu/software/netcdf/docs/netcdf-fortran-install.html
> >> >>
> >> >> Since you disabled shared libraries for the C APIs, building Fortran
> >> >> libraries
> >> >> is somewhat more complicated, as descirbed in the second part of
> >> these
> >> >> instructions:
> >> >>
> >> >> http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-fortran-install.html
> >> >>
> >> >> In particular, I think you need to set LD_LIBRARY_PATH before
> >> invoking
> >> >> the
> >> >> configure script, and LDFLAGS will have to contain "-lnetcdf" and
> >> >> possibly
> >> >> other libraries, as shown in the example.
> >> >>
> >> >> Please let us know if this doesn't work.
> >> >>
> >> >> --Russ
> >> >>
> >> >> > Now the compilation goes well but the make check
> >> >> > fails:
> >> >> >
> >> >> > [...]
> >> >> > ld: 0711-317 ERROR: Undefined symbol: .nf_get_var1_int1_
> >> >> > ld: 0711-317 ERROR: Undefined symbol: .nf_get_var1_int2_
> >> >> > ld: 0711-317 ERROR: Undefined symbol: .nf_get_var1_int_
> >> >> > ld: 0711-317 ERROR: Undefined symbol: .nf_get_var1_real_
> >> >> > [...]
> >> >> >
> >> >> > Namely all the Fortran 77 API are "undefined symbol".
> >> >> > If I reinstall both C and Fortran libraries
> >> >> > without "-qextname" option for xlf/xlf90
> >> >> > (no trailing underscore for Fortran routines)
> >> >> > then all the C APIs (nc_get_var... etc.)
> >> >> > become "undefined symbol"
> >> >> >
> >> >> > In any case I can't succeed in linking
> >> >> > a Fortran program with the new libraries,
> >> >> >
> >> >> > do you have any suggestion/recommendation?
> >> >> >
> >> >> > Thanks in advance,
> >> >> > Regards
> >> >> >
> >> >> > Andrea Storto
> >> >> >
> >> >> >
> >> >> >
> >> >> Russ Rew                                         UCAR Unidata Program
> >> >> address@hidden
> >> http://www.unidata.ucar.edu
> >> >>
> >> >>
> >> > Russ Rew                                         UCAR Unidata Program
> >> > address@hidden                      http://www.unidata.ucar.edu
> >> >
> >> >
> >> >
> >> > Ticket Details
> >> > ===================
> >> > Ticket ID: XZT-324632
> >> > Department: Support netCDF
> >> > Priority: Normal
> >> > Status: Closed
> >> >
> >> >
> >>
> >>
> >> ====================
> >> Andrea Storto, Ph.D.
> >> Dept. of Numerical Applications and Scenarios (ANS)
> >> Euro-Mediterranean Centre for Climate Change (CMCC) - www.cmcc.it
> >> viale Aldo Moro, 44, 7th Floor - 40127 BOLOGNA - Italy
> >> Phone: +39 (0)51 3782605 (int. 205) Mobile: +39 339 8176646
> >> Fax: +39 (0)51 3782655  Email: address@hidden
> >> ====================
> >>
> >>
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                      http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: XZT-324632
> > Department: Support netCDF
> > Priority: Normal
> > Status: Closed
> >
> >
> 
> 
> ====================
> Andrea Storto, Ph.D.
> Dept. of Numerical Applications and Scenarios (ANS)
> Euro-Mediterranean Centre for Climate Change (CMCC) - www.cmcc.it
> viale Aldo Moro, 44, 7th Floor - 40127 BOLOGNA - Italy
> Phone: +39 (0)51 3782605 (int. 205) Mobile: +39 339 8176646
> Fax: +39 (0)51 3782655  Email: address@hidden
> ====================
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: XZT-324632
Department: Support netCDF
Priority: Normal
Status: Closed