Community
cancel
Showing results for 
Search instead for 
Did you mean: 
sparonuz
Beginner
227 Views

Seg fault in fortran MPI_COMM_CREATE_GROUP, works with Open MPI and MPICH

I'm having a segmentation fault that I can not really understand in a simple code, that just:

  • calls the MPI_INIT
  • duplicates the global communicator, via MPI_COMM_DUP
  • creates a group with half of processes of the global communicator, via MPI_COMM_GROUP
  • finally from this group creates a new communicator via MPI_COMM_CREATE_GROUP

Specifically I use this last call, instead of just using MPI_COMM_CREATE, because it's only collective over the group of processes contained in group, while MPI_COMM_CREATE is collective over every process in COMM. The code is attached.

If instead of duplicating the COMM_WORLD, I directly create the group from the global communicator (commented line), everything works just fine.

The parallel debugger I'm using traces back the seg fault to a call to MPI_GROUP_TRANSLATE_RANKS, but, as far as I know, the MPI_COMM_DUP duplicates all the attributes of the copied communicator, ranks numbering included.

I am using the ifort version 18.0.5, but I also tried with the 17.0.4, and 19.0.2 with no better results.
On the contrary, using Open MPI and MPICH 3.3 this program is working jsut fine.

0 Kudos
2 Replies
227 Views

I could reproduce the segmentation fault with Intel MPI 2017 and 2018 but not with any of the 2019 versions (initial, updates 1-3). Please upgrade to 2019 update 3 and try again.

If you have also installed the Intel Trace Analyzer and Collector (for example as part of the complete Intel Parallel Studio XE 2019 Update 3, https://software.intel.com/en-us/parallel-studio-xe) you can activate the MPI Correctness Checking. It will also show you a traceback in case of issues (best if compiled with "-g"):

mpirun -check_mpi -n 3 ./a.out

sparonuz
Beginner
227 Views

Thank you Klaus, 
You are right, with the last version it's working.
Many thanks for your help.

Stella

Reply