Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

MPI program hangs in "MPI_Finalize"

youn__kihang
Novice
4,907 Views

Hi All,

I will explain the current situation and the attached file.

The MPI application performed with LSF is currently debugging due to a problem that does not terminate the operation. Currently, the code level suspects mpi_finalize, and it occurs randomly, not every time, so we need to check more about the occurrence conditions. I inquired about similar symptoms in MPI forum, but the result was not known as post went to the ticket in the middle.
Please check if it is a similar symptom.
https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/611378

- Strace results of MPI executing hosts (I suspected this error " │ + 01:17:43 read(7 ")

----------------------------------------------------------------------------------------
duru0403 has 24 procs as below:

* Name/State       : pmi_proxy / State:    S (sleeping)
  PID/PPID         : 141955 / 141954
  Commandline      : **************/apps/intel/18.4/impi/2018.4.274/intel64/bin/pmi_proxy --control-port duru0374:37775 --pmi-connect alltoall --pmi-aggregate -s 0 --rmk lsf --launcher lsf --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1000390395 --usize -2 --proxy-id -1
  CPU/MEMs_allowed : 0-95 / 0-3
  [<ffffffff96e56e55>] poll_schedule_timeout+0x55/0xb0
  [<ffffffff96e585dd>] do_sys_poll+0x48d/0x590
  [<ffffffff96e587e4>] SyS_poll+0x74/0x110
  [<ffffffff97374ddb>] system_call_fastpath+0x22/0x27
  [<ffffffffffffffff>] 0xffffffffffffffff
  Files            :
     Num of pipes: 26
     Num of sockets: 16
     Num of anon_inodes: 0
  Strace           :
     + /xshared/support/systrace/strace: Process 141955 attached
     + 01:17:43 restart_syscall(<... resuming interrupted poll ...>/xshared/support/systrace/strace: Process 141955 detached
     +  <detached ...>
  Num of subprocs  : 23
  │
  ├─Name/State       : ensda / State:    S (sleeping)
  │ PID/PPID         : 141959 / 141955
  │ Commandline      : **************
  │ CPU/MEMs_allowed : 0 / 0-3
  │ [<ffffffff972f5139>] unix_stream_read_generic+0x309/0x8e0
  │ [<ffffffff972f5804>] unix_stream_recvmsg+0x54/0x70
  │ [<ffffffff972186ec>] sock_aio_read.part.9+0x14c/0x170
  │ [<ffffffff97218731>] sock_aio_read+0x21/0x30
  │ [<ffffffff96e404d3>] do_sync_read+0x93/0xe0
  │ [<ffffffff96e40fb5>] vfs_read+0x145/0x170
  │ [<ffffffff96e41dcf>] SyS_read+0x7f/0xf0
  │ [<ffffffff97374ddb>] system_call_fastpath+0x22/0x27
  │ [<ffffffffffffffff>] 0xffffffffffffffff
  │ Files            :
  │    -  > /dev/infiniband/uverbs0
  │    -  > **************/log_proc00324.log
  │    -   /dev/infiniband/uverbs0
  │    Num of pipes: 6
  │    Num of sockets: 5
  │    Num of anon_inodes: 6
  │ Strace           :
  │    + /xshared/support/systrace/strace: Process 141959 attached
  │    + 01:17:43 read(7, /xshared/support/systrace/strace: Process 141959 detached
  │    +  <detached ...>
  │ Num of subprocs  : 0
----------------------------------------------------------------------------------------


- Version Infomaition
   Intel Compiler: 18.5.234
   Intel MPI: 18.4.234
   DAPL: ofa-v2-mlx5_0-1u

- MPI options I used

declare -x I_MPI_DAPL_UD="1"
declare -x I_MPI_FABRICS="dapl"
declare -x I_MPI_HYDRA_BOOTSTRAP="lsf"
declare -x I_MPI_PIN="1"
declare -x I_MPI_PIN_PROCESSOR_LIST="0-5,24-29"
declare -x I_MPI_ROOT="**************/apps/intel/18.4/compilers_and_libraries/linux/mpi"


- And the code I used

After MPI_FINALIZED, there are 5 lines of codes that are if, close, and deallocate command.
Can these cause the hang problem?

! last part of main_program

call fin_common_par

(there is nothing)

endprogram


!!!!!!!!!!!!!!!!!

subroutine fin_common_par
implicit none
integer :: ierr

call mpi_finalize(ierr)
call fin_log

if(allocated(ranks_per_node)) deallocate(ranks_per_node)
if(allocated(stride_ranks))         deallocate(stride_ranks)

return
end subroutine fin_common_par

!!!!!!!!!!!!!!!!!

subroutine fin_log
implicit none

if(logf_unit == closed_unit) return
close(logf_funit)
logf_unit = closed_unit

return
endsubroutine fin_log

!!!!!!!!!!!!!!!!!


Additionaly, How can I get call stack of process like this post
https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/611378


Thank you in advance.

0 Kudos
1 Solution
AbhishekD_Intel
Moderator
4,907 Views

Hi Kihang,

That was a great finding, and it is also suggested to use the latest MPI versions, but we appreciate your finding and we will also keep a note of this extra option you have used (I_MPI_HYDRA_BRANCH_COUNT, I_MPI_LSF_USE_COLLECTIVE_LAUNCH) while using LSF scheduler. 

We are happy that your issue has been solved.

Please confirm us if we can close this thread. You can always post a new thread if you have any problems. Thank You! 

 

Warm Regards,

Abhishek

View solution in original post

0 Kudos
13 Replies
youn__kihang
Novice
4,907 Views

As a result of further debugging, the mpi_finalize problem became clear.
 

print*,'before_fin'
call mpi_finalize(ierr)
print*,'after_fin'

endprogram

When performing as above, all processes print only "before_fin" and hang it all.
Is there any solution or case at the code or at the library level?
exmaples)
Finalize automatically when the delay time is more than a certain amount.
Use a higher specification library, Install compatible firmware

0 Kudos
AbhishekD_Intel
Moderator
4,907 Views

Hi,

Will you please ensure that all pending communications are completed or canceled before calling MPI_Finalize() because MPI_Finalize() clean all state related to MPI and if there is pending state then usually execution gets hangs at MPI_FInalize. It will still give the o/p of print statement used before its call, and if you try to terminate the execution you may still get the correct results of your computation.

We tried a simple Hello World program and we are unable to reproduce the same error at our end. The simple code is as follows:

program hello
   include 'mpif.h'
   integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)
   call MPI_INIT(ierror)
   call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
   call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
   print*, 'node', rank, ': Hello world'
   call MPI_FINALIZE(ierror)
   print*, 'After Finalize'
   end

I have also attached another code that I tried and it is also running without getting hanged at MPI_Finalize().

So please ensure that all pending communications in your previous subroutines or in previous code completes before the process calls MPI_Finalize().

Can you please send a reproducer code which we can try it on our end?

 

Warm Regards,

-Abhishek

0 Kudos
youn__kihang
Novice
4,907 Views

Dear, Abhishek

First of all, Thank you for your help.
When I tested the code similar to what you sent, there was no problem with MPI_FINALIZE.

As you said in first paragraph, assuming there could be an uncommunicated node that I don't understand,
Can I wait for that node with MPI_BARRIER? Does adding MPI_BARRIER directly above MPI_FINALIZE solves the problem?

As mentioned earlier, the test is so limited because reproduction of the situation in question is very rarely occurring at random (once every 50 times, about 20 minutes per time).

Thank you.

0 Kudos
AbhishekD_Intel
Moderator
4,907 Views

Hi,

Sorry for the late reply, uncommunicated means I want to tell that there might be some function send some data but no receive function for accepting those data then the MPI process will keep waiting and that is also the scenario where MPI_Finalize() will give an error.

 I cannot say that MPI_Barrier will defiantly work but there might be chances if some routines are taking time to execute, in such cases MPI_Barrier will work.

So please ensure that all pending communications are completed or canceled before calling MPI_Finalize() otherwise, you can send us a small reproducer and steps you have followed to reproduce the same error then we will try to figure out the error.

Warm Regards,

Abhishek

0 Kudos
youn__kihang
Novice
4,907 Views

 

Dear Abhishek,


I will update current situation.

I have tested other MPI versions(19u5,19u6,19u7) that are not EOS(end of service).
All other MPI works MPI_Finalize well but doesn't terminate program after MPI_Finalize and hang.
Therefore, I guess that is not MPI problem but LSF scheduler issues.

The options(I_MPI_HYDRA_BRANCH_COUNT, I_MPI_LSF_USE_COLLECTIVE_LAUNCH) IBM recommend to use with Intel MPI are tested now.

(https://www.ibm.com/support/pages/resolve-problem-intel-mpi-job-requires-more-one-hundred-cores-hang-cluster)
*If link of other site's is a problem, I'll delete it.

I hope that these option help to solve this situations.

Let me tell you as soon as the results come out.

Thank you,
Kihang

0 Kudos
AbhishekD_Intel
Moderator
4,907 Views

Dear Kihang,

Thanks for the details, please update us with all information so that if it is a bug we can report to our concerned team.

-Abhishek

0 Kudos
AbhishekD_Intel
Moderator
4,907 Views

Hi Kihang,

There are couple of findings that I wanted to share with you:

1. All processes must call MPI_Finalize routine before exiting. The number of processes running after this routine is called as undefined, it is best not to perform much more than a return after calling MPI_Finalize.

2. Move the Deallocate statement before MPI_Finalize and try.

You can also share your command that you have used for launching the executable so that we can debug this issue.

 

 

Warm Regards,

Abhishek

0 Kudos
youn__kihang
Novice
4,907 Views

Hi Abhishek,

I have something to update for you.
Here is my codes and options that I tested
1) As you recommended, I moved all other statements before MPI_Finalize. The only things that are after MPI_Finalize are return and endprogram.
2) And as I mentioned, I used two extra options (I_MPI_HYDRA_BRANCH_COUNT, I_MPI_LSF_USE_COLLECTIVE_LAUNCH).
3) And I tested 4 Intel MPI libraries (18u4(EoS), 19u5, 19u6, 19u7).

Results:
1) One 2018 version does'nt work well (stop during MPI_Finalize).
2) Three 2019 versions work well (doesn't stop during MPI_Finalize).
3) By the way, two extra options work well.
  - The options helped solve the problem that the LSF scheduler did not recognize that the mpi program ended.


We think there is some compatibility problem between our system and 18u4, so we are going to use version 19.
Thank you for your support.

Best Regards,
Kihang

0 Kudos
AbhishekD_Intel
Moderator
4,908 Views

Hi Kihang,

That was a great finding, and it is also suggested to use the latest MPI versions, but we appreciate your finding and we will also keep a note of this extra option you have used (I_MPI_HYDRA_BRANCH_COUNT, I_MPI_LSF_USE_COLLECTIVE_LAUNCH) while using LSF scheduler. 

We are happy that your issue has been solved.

Please confirm us if we can close this thread. You can always post a new thread if you have any problems. Thank You! 

 

Warm Regards,

Abhishek

0 Kudos
jimdempseyatthecove
Honored Contributor III
4,907 Views

Kihang,

I've been experimenting here and found that under some circumstances, in Debug build, MPI_Finalize appears to not behave as with an implicit MPI_Barrier. Inserting an explicit MPI_Barrier before the MPI_Finalize corrected the quirky issue.

Jim Dempsey

0 Kudos
AbhishekD_Intel
Moderator
4,907 Views

Hi Kinhang,

Please confirm for us if your issue got resolved.

 

0 Kudos
youn__kihang
Novice
4,907 Views

Hi Abhishek,

When the following two conditions are met,
1) Apply the above two options((I_MPI_HYDRA_BRANCH_COUNT, I_MPI_LSF_USE_COLLECTIVE_LAUNCH)
2) Use Intel MPI v19 above
LSF Scheduler properly recognized that the MPI has ended.

Thank you for your help.

Best Regards,
Kihang

0 Kudos
AbhishekD_Intel
Moderator
4,907 Views

Hi Kihang,

Thanks for sharing all the information to this thread. Thank you for the confirmation, we are closing this thread. Please post a new thread if you face any issue. 

 

Warm Regards,

Abhishek

0 Kudos
Reply