Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
1921 Discussions

MPI process hangs on MPI_Finalize with 2019.9.304


Hi All,


I have hanging problem with intel mpi 2019.9.304 like this.


System Information

uname -r    ::    4.18.0-240.10.1.el8_3.x86_64

ifort -v    :: or 2021.2.0

mpirun -V    ::    2019 Update 9 Build 20200923 or 2021.2 Build 20210302

lsf    ::

ofed_info    ::  MLNX_OFED_LINUX-5.3-


Used Code(example.f90)

program example

implicit none
include "mpif.h"
integer :: rank, size
real :: sum, n
integer :: i, j, ierr

call MPI_Init(ierr)
call MPI_Comm_rank(MPI_COMM_WORLD,rank,ierr)
call MPI_Comm_size(MPI_COMM_WORLD,size,ierr)

do i=rank+1,100000,size
!$omp parallel do private(j) reduction(+:n)
do j= 1,1000000
n = n + real(i) + real(j)
end do
!$omp end parallel do
end do

print *, 'MY Rank:', rank, 'MY Part:',n
call MPI_Reduce(n,sum,1,MPI_REAL8,MPI_SUM,0,MPI_COMM_WORLD,ierr)
if(rank == 0) print *, 'PE:', rank, 'total is :', sum
call MPI_Finalize(ierr)
print *, 'End of Code: MyRank: ', rank

end program example


Compile and Run Commands

$ mpiifort -r8 -qopenmp -o example.exe ./example.f90
$ bsub -J Test -n 2 -R "span[ptile=1] affinity[core(2)]" \
mpirun -n 2 ./example.exe



I was repeated the following 4 cases 10 times.

Case 1 : ifort / intel mpi 2019.9.304

Case 2 : ifort 2021.2.0 / intel mpi 2019.9.304

Case 3 : ifort / intel mpi 2021.2.0

Case 4 : ifort 2021.2.0 / intel mpi 2021.2.0

In Cases 3 and 4, it is performed normally. However, Cases 1 and 2 make hanging problem on MPI_Finalize(I thought based on printed line after MPI_Finalize).

Cases 3 and 4 all had good results below,

and Case 1 had good results 3 out of 10

and Case 2 had good results 4 out of 10.


Good Result Example

MY Rank: 0 MY Part: 2.750002500000000E+016
MY Rank: 1 MY Part: 2.750007500000000E+016
PE: 0 total is : 5.500010000000000E+016
End of Code: MyRank: 1
End of Code: MyRank: 0

Bad Result Example(with wall-time error)

MY Rank: 1 MY Part: 2.750007500000000E+016
MY Rank: 0 MY Part: 2.750002500000000E+016
PE: 0 total is : 5.500010000000000E+016
End of Code: MyRank: 0
User defined signal 2



Could I get some solutions about this problem?

And I tried somethings referring to MPI program hangs in "MPI_Finalize" , but some environment variables(I_MPI_HYDRA_BRANCH_COUNT, I_MPI_LSF_USE_COLLECTIVE_LAUNCH) not working.


Thanks in advance.

Labels (3)
0 Kudos
8 Replies



Thanks for providing the reproducible code with all specifications and expected output.


However, we have checked all 4 cases you mentioned multiple times but we are unable to reproduce your issue(error).


Could you please provide the debug information/error log for the "Bad Result Example" to understand your issue better?

Use the below command for providing debug information:

 I_MPI_DEBUG=10 mpirun -n 2 ./example.exe


Thanks & Regards





Thanks for your reply, and sorry for the late reply.


I found something interesting while adjusting I_MPI_DEBUG option, so I've been experimenting over the past few days.

As a result, there is no problem(hanging on MPI_Finalize) with I_MPI_DEBUG >= 3.

What's the difference depending on I_MPI_DEBUG options except for printing debug information?

I'm attaching the details(some logs) below to share information.


  • I_MPI_DEBUG_2.out : stdout with I_MPI_DEBUG=2,
  • I_MPI_DEBUG_2.err : stderr with I_MPI_DEBUG=2, when the problem occured (about 4/10 in my system)
  • I_MPI_DEBUG_3.out : stdout with I_MPI_DEBUG=3, it had empty stderr (I only concealed my node name of rank 0)
  • I_MPI_DEBUG_10.out : stdout with I_MPI_DEBUG=10, it had empty stderr (I concealed my node name of of rank 0 and path in I_MPI_ROOT)


Thanks and Regards

HG Choe




Thanks for providing the required information.


Could you please run your code using the below command:

mpiexec.hydra -n 2 -ppn 2 ./example.exe


And, also could you please let us know if you are able to get the expected outcome?


Thanks & Regards




Hi, Varsha.


I think you intended to use only one node by the ppn option, is that right?

In conclusion, it is worked well with ppn option(modified ptile by lsf).


bsub -J Test -n 2 -R "span[ptile=2] affinity[core(2)]" \
mpiexec.hydra -n 2 -ppn 2 ./example.exe



But I need to run with multiple nodes.

My hybrid application(mpi+openmp) is targeted n >= 10 and omp_num_threads = 76.

(ex) Below run command makes hanging problem.


bsub -J Test -n 10 -R "span[ptile=1] affinity[core(76)]" \
OMP_NUM_THREADS=76 mpiexec.hydra -n 10 ./example.exe



Thanks and Regards

HG Choe


Hi Choe,


Thanks for providing the information.


Could you please try the following points mentioned below and let us know the behavior/outcome:


-->[0] MPI startup(): library kind: release_mt

1. We have observed from the debug information that you are using library kind is "release_mt". Could you please let us know if you are facing the same issue with the library kind is "release".


2. Could you please let us know if this issue is specific to this application or everyother applications? Could you please try running the IMB-MPI1 benchmark and let us know the output?

mpiexec.hydra -np 2 -ppn 1 IMB-MPI1 allreduce


3. If you are still facing the issue could you please provide debug trace of the MPI process which hangs by using the GDB Debug tool.


4. And also, Could you please try using the interactive shell and let us know if you are able to get the expected results?


Thanks & Regards 





We haven't heard back from you. Could you please provide an update on your issue?

Thanks & Regards




First of all, I'm sorry for the late reply again.


Fortunately, we found some solutions in the meantime.


1. To update libfabric to the latest(OpenFabrics 1.13.1, Release Release v1.13.1 · ofiwg/libfabric · GitHub)

    (cf) Intel® MPI Library 2019 Over Libfabric*

2. To add the MPI_Barrier function before MPI_Finalize

call MPI_Reduce(n,sum,1,MPI_REAL8,MPI_SUM,0,MPI_COMM_WORLD,ierr)
if(mype == 0) print *, 'PE:', mype, 'total is :', sum
call MPI_Barrier(MPI_COMM_WORLD,ierr)
call MPI_Finalize(ierr)


So we've applied solution 1(the lastest libfabric) to our system and the hang problem is gone.

(We thought that solution 2(MPI_Barrier) is something like naive or clumsy.)

In my personal opinion, I have doubts about the compatibility between CentOS(8.3),  OFED(5.3) and libfabric(1.10.1).

I think we can close this issue.


Additionally, I would appreciate it if you guys provide the information of the compatibility through a reply.

(if you guys have some report or known issue)


* Some information about previous reply.

1. We tried 'release', 'release_mt', 'debug' and 'debug_mt', and all have same problem.

    And then all work good with the latest libfabric(1.13.1).

2. The case of "IMB-MPI1 allreduce" was same(It works with the latest libfabric).

3&4. I cound not do anything about GDB Debug and using the interative shell because of our politics.

         (I cannot approach the compute nodes without lsf)


Thanks for taking the time to review my issue




>>(if you guys have some report or known issue)

Could you please refer to the below link for the Intel MPI updates, known issues, and system requirements.

Glad to know that your issue is resolved. Thanks for sharing the solution with us. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.

Thanks & Regards