[netcdf-hdf] bug in MPI cleanup

NOTE: The netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.

Hi

I've got a small netcdf4 program that when run, gives the error
"Attempting to use an MPI routine after finalizing MPICH".  This
initially had me puzzled, because the only thing I do after calling
MPI_Finalize() is 'return 0'.

Turns out, HDF5 hooks a routine into atexit(3) that cleans up MPI
structures.  Parts of this cleanup routine should not be run after the
MPI_Finalize:  here's the backtrace at the moment where the error
about using MPI routines after finalizing MPICH is printed:

#1  0x084bdcea in PMPI_Comm_free (comm=0x8711c20) at /home/robl/work/mpich2/src/
mpi/comm/comm_free.c:73
#2  0x081b687f in H5FD_mpi_comm_info_free (comm=0x8711c20, info=0x8711c24) at ..
/../src/H5FDmpi.c:326
#3  0x081ba2f0 in H5FD_mpio_fapl_free (_fa=0x8711c20) at ../../src/H5FDmpio.c:87
0
#4  0x0819936b in H5FD_pl_close (driver_id=134217729, free_func=0x81ba113 <H5FD_
mpio_fapl_free>, pl=0x8711c20) at ../../src/H5FD.c:625
#5  0x08199f4c in H5FD_fapl_close (driver_id=134217729, fapl=0x8711c20) at ../..
/src/H5FD.c:791
#6  0x0829154c in H5P_facc_close (fapl_id=167772177, close_data=0x0) at ../../sr
c/H5Pfapl.c:431
#7  0x08279aca in H5P_close (_plist=0x87119c0) at ../../src/H5P.c:5370
#8  0x08202738 in H5I_clear_type (type=H5I_GENPROP_LST, force=0) at ../../src/H5
I.c:604
#9  0x0826949d in H5P_term_interface () at ../../src/H5P.c:488
#10 0x080c416b in H5_term_library () at ../../src/H5.c:266
#11 0xb7d979d9 in exit () from /lib/tls/i686/cmov/libc.so.6
#12 0xb7d80ec4 in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
#13 0x08085171 in _start ()


Since my test program called MPI_Finalize, it's incorrect for the HDF5
library to also call MPI_Comm_free.  

I would advise not hooking MPI-IO cleanup into atexit(3): you already
correctly make the user call MPI_Init and MPI_Finalize.  Can hdf5
cleanup instead occur as part of nc_close?

Thanks
==rob

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA                 B29D F333 664A 4280 315B

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA                 B29D F333 664A 4280 315B


  • 2007 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-hdf archives: