[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #XZT-324632]: installing threadsafe NetCDF 4.2 on IBM/AIX



Hi Andrea,

> after some research I found that the problem
> was not due to some library or compilation related
> setting, but to the AIX environmental variable
> 
> export address@hidden@DATAPSIZE=64K
> 
> which handles the memory space at runtime.
> 
> By taking out DATAPSIZE=64K, the program can correctly
> open the NetCDF files also in a multi-thread environment.
> I remind you that the routine generating the Segmentation fault
> had no OMP parallel regions and that the problem did not
> occur when the job was launched as serial.
> 
> I can't figure out the reason, but hope you can and this might help,
> this occurred for no matter which NetCDF version (from 3.6 to 4.2),

Thanks very much for reporting back on a workaround for the problem!
We would never have found this, as we don't have access to an up-to-date
AIX platform for testing.  Even if we did, we probably would not have
set LDR_CNTRL to a non-default value ...

--Russ

> > Andrea,
> >
> >> thanks a lot for your hints,
> >> I managed to correctly build the libraries
> >> by restoring the shared-lib option in the C library
> >> (also using -lnetcdf into LDFLAGS although
> >> my guess is that this applies only to "make check"
> >> and not to external code that I could not link properly anyway)
> >>
> >> Unfortunately, even compiling with thread-safe
> >> IBM compilers and using this last NetCDF release, I have coredump
> >> in a Fortran code that is compiled with OpenMP enabled.
> >> This happens RANDOMLY in a routine that has no open threads
> >> i.e. 64 NetCDF files are opened sequentially out
> >> of an OpenMP parallel region.
> >> But when I run the code serially (OMP_NUM_THREADS=1)
> >> the problem disappears.
> >> The traceback is as follows:
> >>
> >> Segmentation fault in leftmost at 0x9000000003470b8 ($t1)
> >> 0x9000000003470b8 (leftmost+0x8) e8c90008          ld   r6,0x8(r9)
> >> (dbx) where
> >> leftmost(??, ??) at 0x9000000003470b8
> >> malloc_y(0x1f, 0x0, 0xfffffffffffb860, 0xfffffffffffb87d, 0x0, 0x1e,
> >> 0x10000001, 0x0) at 0x900000000349280
> >> malloc_common_79_63(??) at 0x9000000003461a0
> >> nf_open(0xfffffffffffb860, 0xfffffffffff8a4c, 0xfffffffffff88a4,
> >> 0x1e0000001e) at 0x10011b8d0
> >> __netcdf_NMOD_nf90_open(0xfffffffffffb860, 0xfffffffffff8a4c,
> >> 0xfffffffffff88a4, 0x0, 0x0, 0x1e) at 0x100070a2c
> >> io_obs_(??, ??, ??, ??, ??, ??, ??, ??), line 242 in "io_obs.F90"
> >> readobs103_(), line 158 in "readobs103.F90"
> >> varjob_(), line 113 in "varjob.F90"
> >> master(), line 32 in "master.F90"
> >>
> >> and "line 242 of io_obs.F90 looks simply
> >> CALL CHECK( NF90_OPEN(CFILE, NF90_NOWRITE, NCID) )
> >>
> >> I would really appreciate if you have any workaround
> >> in the NetCDF code or any other suggestion,
> >
> > If all you're doing is reading the files, you should be able to read one
> > or more
> > files concurrently through the netCDF API's, so I don't understand what's
> > causing
> > the error.  You should not need to use the NF90_SHARE flag in the
> > NF90_OPEN call
> > that's described here:
> >
> >   http://www.unidata.ucar.edu/netcdf/docs/netcdf-f90/NF90_005fOPEN.html
> >
> > but you could certainly try that to see if it helps, using something like
> >
> >   CALL CHECK( NF90_OPEN(CFILE, or(NF90_NOWRITE,NF90_SHARE), NCID) )
> >
> > If you need to write to a netCDF file from more than one process
> > concurrently,
> > you would have to use one of the parallel netCDF libraries, either pnetCDF
> > for
> > classic format files or netCDF-4 with HDF5 built for parallel I/O.
> >
> > --Russ
> >
> >> >> Hi Andrea,
> >> >>
> >> >> > I am trying to install a static version of NetCDF 4.2
> >> >> > (I am interested in the Fortran APIs) to be threadsafe and in 64b
> >> >> > on a IBM Power6 machine.
> >> >
> >> > Incidentally, the netCDF library is *not* threadsafe. The C library
> >> > internally maintains a list of information about open netCDF files in
> >> > a global data structure that gets modified when files are opened or
> >> > closed.  Fixing that problem is currently an open issue:
> >> >
> >> >   https://www.unidata.ucar.edu/jira/browse/NCF-115
> >> >
> >> > but the last comment indicates some recent progress has been made.
> >> You
> >> > can register to be notified when that issue gets resolved by selecting
> >> > the "Watch" link on that jira page ...
> >> >
> >> > --Russ
> >> >
> >> >> > To do that, I first install the C library (v4.2.1.1) with the
> >> commands
> >> >> >
> >> >> > + export OBJECT_MODE=64
> >> >> > + export CC=xlc_r
> >> >> > + export FC=xlf90_r
> >> >> > + export F77=xlf_r
> >> >> > + export FCFLAGS=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> >> -qcache=auto
> >> >> > -qarch=pwr6 -qtune=pwr6
> >> >> > + export FFLAGS=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> >> -qcache=auto
> >> >> > -qarch=pwr6 -qtune=pwr6
> >> >> > + export CFLAGS=-q64 -qmaxmem=-1 -qarch=pwr6 -qtune=pwr6
> >> >> > + export LDFLAGS=-q64 -b64
> >> >> > + export ARFLAGS=-X 64 -cru
> >> >> > + export AR_FLAGS=-X 64 -cru
> >> >> > + ./configure --prefix=/users/home/ans012/local/netcdf-4.2-c
> >> >> > --disable-netcdf-4 --disable-doxygen --disable-shared
> >> >> >
> >> >> > The installation goes well,
> >> >> > then I move to the Fortran API package (netcdf-fortran-4.2)
> >> >> > that I install accordingly:
> >> >> >
> >> >> > + NC=/users/home/ans012/local/netcdf-4.2-c
> >> >> > + export OBJECT_MODE=64
> >> >> > + export CC=xlc_r
> >> >> > + export FC=xlf90_r
> >> >> > + export F77=xlf_r
> >> >> > + export FCFLAGS=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> >> -qcache=auto
> >> >> > -qarch=pwr6 -qtune=pwr6
> >> >> > + export FFLAGS=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> >> -qcache=auto
> >> >> > -qarch=pwr6 -qtune=pwr6
> >> >> > + export F90FLAGS_f90=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> >> > -qcache=auto -qarch=pwr6 -qtune=pwr6
> >> >> > + export FFLAGS_f90=-q64 -qmaxmem=-1 -NS32648 -qextname -qsource
> >> >> > -qcache=auto -qarch=pwr6 -qtune=pwr6
> >> >> > + export CFLAGS=-q64 -qmaxmem=-1 -qarch=pwr6 -qtune=pwr6
> >> >> > + export LDFLAGS=-q64 -b64
> >> -L/users/home/ans012/local/netcdf-4.2-c/lib
> >> >> > + export CPPFLAGS=-I/users/home/ans012/local/netcdf-4.2-c/include
> >> >> > + export ARFLAGS=-X 64 -cru
> >> >> > + export AR_FLAGS=-X 64 -cru
> >> >> > + ./configure --prefix=/users/home/ans012/local/netcdf-4.2-fortran
> >> >> > --disable-sharedhttp://www.unidata.ucar.edu/software/netcdf/docs/netcdf-fortran-install.html
> >> >>
> >> >> Since you disabled shared libraries for the C APIs, building Fortran
> >> >> libraries
> >> >> is somewhat more complicated, as descirbed in the second part of
> >> these
> >> >> instructions:
> >> >>
> >> >> http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-fortran-install.html
> >> >>
> >> >> In particular, I think you need to set LD_LIBRARY_PATH before
> >> invoking
> >> >> the
> >> >> configure script, and LDFLAGS will have to contain "-lnetcdf" and
> >> >> possibly
> >> >> other libraries, as shown in the example.
> >> >>
> >> >> Please let us know if this doesn't work.
> >> >>
> >> >> --Russ
> >> >>
> >> >> > Now the compilation goes well but the make check
> >> >> > fails:
> >> >> >
> >> >> > [...]
> >> >> > ld: 0711-317 ERROR: Undefined symbol: .nf_get_var1_int1_
> >> >> > ld: 0711-317 ERROR: Undefined symbol: .nf_get_var1_int2_
> >> >> > ld: 0711-317 ERROR: Undefined symbol: .nf_get_var1_int_
> >> >> > ld: 0711-317 ERROR: Undefined symbol: .nf_get_var1_real_
> >> >> > [...]
> >> >> >
> >> >> > Namely all the Fortran 77 API are "undefined symbol".
> >> >> > If I reinstall both C and Fortran libraries
> >> >> > without "-qextname" option for xlf/xlf90
> >> >> > (no trailing underscore for Fortran routines)
> >> >> > then all the C APIs (nc_get_var... etc.)
> >> >> > become "undefined symbol"
> >> >> >
> >> >> > In any case I can't succeed in linking
> >> >> > a Fortran program with the new libraries,
> >> >> >
> >> >> > do you have any suggestion/recommendation?
> >> >> >
> >> >> > Thanks in advance,
> >> >> > Regards
> >> >> >
> >> >> > Andrea Storto
> >> >> >
> >> >> >
> >> >> >
> >> >> Russ Rew                                         UCAR Unidata Program
> >> >> address@hidden
> >> http://www.unidata.ucar.edu
> >> >>
> >> >>
> >> > Russ Rew                                         UCAR Unidata Program
> >> > address@hidden                      http://www.unidata.ucar.edu
> >> >
> >> >
> >> >
> >> > Ticket Details
> >> > ===================
> >> > Ticket ID: XZT-324632
> >> > Department: Support netCDF
> >> > Priority: Normal
> >> > Status: Closed
> >> >
> >> >
> >>
> >>
> >> ====================
> >> Andrea Storto, Ph.D.
> >> Dept. of Numerical Applications and Scenarios (ANS)
> >> Euro-Mediterranean Centre for Climate Change (CMCC) - www.cmcc.it
> >> viale Aldo Moro, 44, 7th Floor - 40127 BOLOGNA - Italy
> >> Phone: +39 (0)51 3782605 (int. 205) Mobile: +39 339 8176646
> >> Fax: +39 (0)51 3782655  Email: address@hidden
> >> ====================
> >>
> >>
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                      http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: XZT-324632
> > Department: Support netCDF
> > Priority: Normal
> > Status: Closed
> >
> >
> 
> 
> ====================
> Andrea Storto, Ph.D.
> Dept. of Numerical Applications and Scenarios (ANS)
> Euro-Mediterranean Centre for Climate Change (CMCC) - www.cmcc.it
> viale Aldo Moro, 44, 7th Floor - 40127 BOLOGNA - Italy
> Phone: +39 (0)51 3782605 (int. 205) Mobile: +39 339 8176646
> Fax: +39 (0)51 3782655  Email: address@hidden
> ====================
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: XZT-324632
Department: Support netCDF
Priority: Normal
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.