Hello Carl, We use Docker for our regression testing as well, for serial and MPI-based builds, and I'm not currently seeing the same issue that you are. This makes me suspect it is something specific to SLURM/Pyxis (neither of which I am terribly familiar with). You can run our mpich-based parallel tests with the following docker command: $ docker run --rm -it -e TESTPROC=16 -e USEAC=TRUE unidata/nctests:mpich You can tweak the value of TESTPROC to tell the docker container how many processors to use. I'm at a bit of a loss due to my SLURM/Pyxis blind spot. I know there are errors similar to what you reported when using the openmpi package (instead of mpich2), but this is apparently a known issue. If nothing else, perhaps your trained eye can compare what the unidata/nctests:mpich docker container is doing against your local docker containers. If you run the container interactively (e.g. docker run --rm -it unidata/nctests:mpich bash), you will find the config file used to build the image, Dockerfile.mpich, as well as the shell script used to run the tests, run_par_tests.sh. Are your docker images on a public repo that I can pull them down from? I would be happy to take a look at them hands-on. Thanks, have a great day, -Ward > I'm starting to build NetCDF & other libraries inside Docker containers, > and have been running the make check regression-tests to validate the > installs. > > I'm running them under SLURM/Pyxis so the usual mpirun & mpiexec might > not be working the way I'm used to. > > In particular, all the PNetCDF regressions were failing until I added > these extra settings > > make *TESTSEQRUN="mpirun -n 1" TESTMPIRUN="mpiexec -n NP"* -i -k check > > because (I think) the test harness wasn't recognizing the parallel > environment and not invoking mpiexec by default. > > With NetCDF-C I'm seeing the make check use this command, for example > > exec /usr/local/src/netcdf-c-4.7.4/nc_test/.libs/t_nc > > which gives an error: > > [circe-n001:06587] OPAL ERROR: Not initialized in file > pmix3x_client.c at line 112 > -------------------------------------------------------------------------- > The application appears to have been direct launched using "srun", > but OMPI was not built with SLURM's PMI support and therefore cannot > execute. There are several options for building PMI support under > SLURM, depending upon the SLURM version you are using: > > Â version 16.05 or later: you can use SLURM's PMIx support. This > Â requires that you configure and build SLURM --with-pmix. > > Â Versions earlier than 16.05: you must use either SLURM's PMI-1 or > Â PMI-2 support. SLURM builds PMI-1 by default, or you can manually > Â install PMI-2. You must then build Open MPI using --with-pmi pointing > Â to the SLURM PMI library location. > > Please configure as appropriate and try again. > -------------------------------------------------------------------------- > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > ***ÂÂÂ and potentially your MPI job) > [circe-n001:06587] Local abort before MPI_INIT completed completed > successfully, but am not able to aggregate error messages, and not > able to guarantee that all other processes were killed! > > If I (manually) run it with an explicit mpirun instead > > mpirun -N 1 /usr/local/src/netcdf-c-4.7.4/nc_test/.libs/t_nc > > then it looks like its working correctly. > > Are there some extra settings I should be using with the NetCDF-C & > NetCDF-F regression-tests? > > Most of the test are passing already, but I get these failures that I > wouldn't see outside the container environment: > > tst_netcdf4.sh > tst_nccopy4.sh > t_nc > tst_small > nc_test > tst_misc > tst_norm > tst_nofill > tst_atts3 > tst_formatx_pnetcdf > tst_default_format_pnetcdf > tst_cdf5format > run_pnetcdf_test.sh > tst_compounds > tst_compounds3 > tst_atts3 > > I'm not sure that they're all from the same root-cause that I'm showing > above, but I'm hoping it'd be the easiest one to fix for starters. > Thanks, > > Carl Ponder, Ph.D. > Senior Engineer, NVIDIA Developer Technology > > > Ticket Details =================== Ticket ID: TDA-721004 Department: Support netCDF Priority: Normal Status: Closed =================== NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.