[netcdfgroup] 4.7.4 failing on multiple nodes

Hi,

I thought I'd give netcdf 4.7.4 a try for the compression in parallel IO (using 
hdf5 1.10.7, pnetcdf 1.9.0, netcdf-fortran-4.5.3) on a NOAA cluster. I've been 
using intel 19 with mvapich2.3, which worked fine with earlier versions 
(4.3.something). So the problem I have is that it works fine on a single node, 
but get various failures when trying to run a job that uses 2 or more nodes. It 
also fails if the IO is not parallel (standard netcdf-4 where each process 
writes its data in turn).

I have also compiled everything (including cloud model code) using Intel MPI, 
which fails promptly with a seg fault when it tries to run on 2 nodes. (Here, I 
am comparing 4 or 9 threads on a single node or 16 threads split on 2 nodes. If 
I force the 16 thread version to run on a single node, it runs fine.)

The problem seems to be reproducible with a simple write/read test adapted from 
ftst_parallel.F, so it is seems not specific to my model code. Fails with both 
pnetcdf and mpiio

Any ideas what could be the issue here? I am stumped.

-- Ted



  • 2020 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: