[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #UOD-815952]: MPI & NetCDF



> Hello;
> I am encountering a problem when building netcdf with mpi.
> I will give some background to the problem;
>
> I have a fortran model that I am parallelizing with MPI.
> The I/O routines use netcdf4 calls.
> They segfault when doing synchronous reads to the dataset.
> Unfortunately --enable-threadsafe and --enable-fortran are not compatible
> in hdf5.  I thought I could get around this by building hdf5 with
> --enable-parallel.  However, building the netcdf4 compiles fine but fails
> on make check.

Howdy Andy!

Multi-processor and multi-threaded are two very different things, as I am sure
you know. NetCDF does the first but not the second.

Do you need multi-threading? We were just talking about the priority of adding
that feature, so any comments you have concerning it's use would be welcome.

But for now, the answer is that the netCDF library is not thread-safe. It
should not be used by multiple threads.

For parallel netCDF-4 you *must* build HDF5 with --enable-parallel. If you do
that, and point to that HDF5 build with --with-hdf5= during the netcdf
configure, then netCDF will build for parallel I/O automatically. (Search for
"parallel" in the netcdf configure output to confirm this).

>
> I am using netcdf4.0 with hdf5-1.8.2.
> I am building on linux/amd64 and freebsd/amd64 (the freebsd machine is a
> pentium-d architecture, running 64bit), and both exhibit the same error.
> The compiler is: linux: gcc4.3.0, freebsd: gcc4.3.4
> I am using openmpi1.3 but I have tried 1.2.6 and 1.2.8
>
> I have tried numerous configure options for netcdf, but they are all minor
> variations on this:
> CC=/storage/ajpintar/local/bin/mpicc FC=/storage/ajpintar/local/bin/mpif90
> F77=/storage/ajpintar/local/bin/mpif77 ./configure
> --prefix=/storage/ajpintar/local/ --enable-fortran --enable-netcdf-4
> --with-hdf5=/storage/ajpintar/local/ --enable-separate-fortran
>
> Note that the PATH and LD_CONFIG variables point to the mpi and hdf5
> locations (/storage/ajpintar/local/bin, /storage/ajpintar/local/lib) as
> well as the standard gcc /usr/lib and /usr/lib64 (in the case of the
> linux attempt).
>
> When running make check for hdf5 I get this failure:
> ...
> Testing hard normalized double -> unsigned long conversions
> PASSED
> Testing hard normalized long double -> signed char conversions
> 0.18user 0.04system 0:00.26elapsed 87%CPU (0avgtext+0avgdata
> 0maxresident)k
> 0inputs+248outputs (0major+9316minor)pagefaults 0swaps
> make[4]: *** [dt_arith.chkexe_] Error 1
> make[4]: Leaving directory `/aos/home/ajpintar/tmp/hdf5-1.8.2/test'
> make[3]: *** [build-check-s] Error 2
> make[3]: Leaving directory `/aos/home/ajpintar/tmp/hdf5-1.8.2/test'
> make[2]: *** [test] Error 2
> make[2]: Leaving directory `/aos/home/ajpintar/tmp/hdf5-1.8.2/test'
> make[1]: *** [check-am] Error 2
> make[1]: Leaving directory `/aos/home/ajpintar/tmp/hdf5-1.8.2/test'
> make: *** [check-recursive] Error 1
>
>
>
> However, I am led to believe that this is not a 'bad' problem since long
> double support is somewhat experimental at this point.

This is something the HDF5 people should hear about and address. I suggest you
send this to address@hidden.

>
> When running make check on the netcdf4 build I get the following error:
> [a bunch of util.o errors, ending with:]
> util.o: In function `internal_max':
> /aos/home/ajpintar/tmp/netcdf-4.0/nf_test/util.F:1361: undefined reference
> to `max_double_'
> /aos/home/ajpintar/tmp/netcdf-4.0/nf_test/util.F:1345: undefined reference
> to `max_int_'
> /aos/home/ajpintar/tmp/netcdf-4.0/nf_test/util.F:1323: undefined reference
> to `max_schar_'
> /aos/home/ajpintar/tmp/netcdf-4.0/nf_test/util.F:1335: undefined reference
> to `max_short_'
> /aos/home/ajpintar/tmp/netcdf-4.0/nf_test/util.F:1353: undefined reference
> to `max_float_'
> util.o: In function `internal_min':
> /aos/home/ajpintar/tmp/netcdf-4.0/nf_test/util.F:1277: undefined reference
> to `min_int_'
> /aos/home/ajpintar/tmp/netcdf-4.0/nf_test/util.F:1255: undefined reference
> to `min_schar_'
> /aos/home/ajpintar/tmp/netcdf-4.0/nf_test/util.F:1267: undefined reference
> to `min_short_'
> collect2: ld returned 1 exit status
> make[2]: *** [nf_test] Error 1
> make[2]: Leaving directory `/aos/home/ajpintar/tmp/netcdf-4.0/nf_test'
> make[1]: *** [check-am] Error 2
> make[1]: Leaving directory `/aos/home/ajpintar/tmp/netcdf-4.0/nf_test'
> make: *** [check-recursive] Error 1
>
>
>
> After searching for this I was led to believe this is a problem with
> netcdf being unable to locate the fortran libraries.  The support page on
> the unidata site mentions to be sure about LD_CONFIG and PATH environment
> variables, but they are correct in that they point to mpi, hdf5, and
> gfortran/gcc.
>

I will have to ask you to send me the config.log file that was generated when
netcdf configure was run.

Thanks,

Ed


Ticket Details
===================
Ticket ID: UOD-815952
Department: Support netCDF
Priority: Normal
Status: Open