I'm having a segmentation fault that I can not really understand in a simple code, that just:
- calls the MPI_INIT
- duplicates the global communicator, via MPI_COMM_DUP
- creates a group with half of processes of the global communicator, via MPI_COMM_GROUP
- finally from this group creates a new communicator via MPI_COMM_CREATE_GROUP
Specifically I use this last call, instead of just using MPI_COMM_CREATE, because it's only collective over the group of processes contained in group, while MPI_COMM_CREATE is collective over every process in COMM. The code is attached.
If instead of duplicating the COMM_WORLD, I directly create the group from the global communicator (commented line), everything works just fine.
The parallel debugger I'm using traces back the seg fault to a call to MPI_GROUP_TRANSLATE_RANKS, but, as far as I know, the MPI_COMM_DUP duplicates all the attributes of the copied communicator, ranks numbering included.
I am using the ifort version 18.0.5, but I also tried with the 17.0.4, and 19.0.2 with no better results.
On the contrary, using Open MPI and MPICH 3.3 this program is working jsut fine.
I could reproduce the segmentation fault with Intel MPI 2017 and 2018 but not with any of the 2019 versions (initial, updates 1-3). Please upgrade to 2019 update 3 and try again.
If you have also installed the Intel Trace Analyzer and Collector (for example as part of the complete Intel Parallel Studio XE 2019 Update 3, https://software.intel.com/en-us/parallel-studio-xe) you can activate the MPI Correctness Checking. It will also show you a traceback in case of issues (best if compiled with "-g"):
mpirun -check_mpi -n 3 ./a.out