Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
1829 Discussions

simple MPI code generating deadlock

conor_p_
Beginner
163 Views

Hello everyone,

I hope this is the appropriate forum for this question. I have recently started learning MPI, and can't seem to figure out why the following codes generating deadlock which occurs is subroutine try_comm. I compiled and ran as follows

mpiifort global.f90 try.f90 new.f90 -o new.out

mpirun -n 2 ./new.out

my output:

hello from rank 0 2

entering 0 0

waiting for receive from rank 1  1

entering 1 0

after send

finished 1 0
module global

  implicit none
  integer :: size,rank
  integer,allocatable :: p2n(:),n2p(:)
end module global
module try
  use mpi
  use global
  implicit none
contains
  
  subroutine try_comm(loop,n0)
    implicit none
    integer :: n0,loop
    integer :: ierr,msgtag
    integer :: n0temp
    integer :: status(MPI_STATUS_SIZE)
    

    call mpi_barrier(MPI_COMM_WORLD,ierr)

    print*,'entering',rank,loop
    if(rank.ne.0)then
       call mpi_send(n0,1,mpi_int,0,msgtag,MPI_COMM_WORLD,ierr)
       print*,'after send'
    endif
    
    if(rank.eq.0)then
       do loop = 1,size-1
          print*,'waiting for receive from rank',loop,size-1
          call mpi_recv(n0temp,1,mpi_int,loop,msgtag,MPI_COMM_WORLD,status,ierr)
          n0 = n0temp
          print*,'received 0:',n0
       enddo
       
    endif

    print*,'finished',rank,loop

    call mpi_barrier(MPI_COMM_WORLD,ierr)
    
  end subroutine try_comm
  
end module try
program new
  use mpi
  use global
  use try
  implicit none
  integer :: n0
  integer :: ierr,msgtag
  integer :: loop
  integer :: n0temp
  integer :: status(MPI_STATUS_SIZE)
  
  call mpi_init(ierr)
  call mpi_comm_size(MPI_COMM_WORLD,size,ierr)
  call mpi_comm_rank(MPI_COMM_WORLD,rank,ierr)
  
  
  if(rank.eq.0)then
     print*,'hello from rank',rank,size
     allocate(p2n(0:size-1),n2p(0:size-1))
  endif
  
  n0 = rank*10

 
  do loop = 1,20
     call try_comm(loop,n0)
  enddo

  call mpi_finalize(ierr)

  stop
end program new

 

However, if I change new.f90 to the following where now the subroutine try_comm is included in the do loop as follows, I do not get deadlock.

program new
  use mpi
  use global
  use try
  implicit none
  integer :: n0
  integer :: ierr,msgtag
  integer :: loop,loopa
  integer :: n0temp
  integer :: status(MPI_STATUS_SIZE)
  
  call mpi_init(ierr)
  call mpi_comm_size(MPI_COMM_WORLD,size,ierr)
  call mpi_comm_rank(MPI_COMM_WORLD,rank,ierr)
  
  
  if(rank.eq.0)then
     print*,'hello from rank',rank,size
     allocate(p2n(0:size-1),n2p(0:size-1))
  endif
  
  n0 = rank*10


  do loopa = 1,20
     !call try_comm(loop,n0)

     call mpi_barrier(MPI_COMM_WORLD,ierr)
     
     print*,'entering',rank,loopa
     if(rank.ne.0)then
        call mpi_send(n0,1,mpi_int,0,msgtag,MPI_COMM_WORLD,ierr)
        print*,'after send'
     endif
     
     if(rank.eq.0)then
        do loop = 1,size-1
           print*,'waiting for receive from rank',loop,size-1
           call mpi_recv(n0temp,1,mpi_int,loop,msgtag,MPI_COMM_WORLD,status,ierr)           
           n0 = n0temp
           print*,'received 0:',n0
        enddo
        
     endif
     
     print*,'finished',rank,loopa
     call mpi_barrier(MPI_COMM_WORLD,ierr)

     
  enddo


  call mpi_finalize(ierr)

  stop
end program new

 

0 Kudos
3 Replies
Michael_R_2
Beginner
163 Views

I see that you use an uninitialized value for  msgtag . It is input for MPI_SEND and MPI_SEND.

Thus the actual value for  msgtag  is random, and might be different for the 2 code-variants. What happens if it is a NaN ?

So make sure that you do not use uninitialized values.

By the way, I would not use an variable named   size  , because   size  is also  an intrinsic fct in Fortran.

Greetings Michael R.

diedro
Beginner
163 Views

dear Conor,

I suggest to use only two CPU, one to send and one to receive. It is easier to catch the error.

Cheers,

Diego

 

 

conor_p_
Beginner
163 Views

thanks guys! that first response actually answered the question for me. For some reason my understanding after reading a couple MPI examples was that MPI handled msgtag behind the scenes, but evidently that is very,very wrong. thanks again.

Reply