Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Issue with MPI_Sendrecv

Piotr_G_1
Beginner
897 Views

Hello,

I am experiencing issues while using MPI_Sendrecv on multiple machines. In the code I am sending a vector in the circular manner in parallel. Each process is sending data to the subsequent process and receiving data from preceding process. Surprisingly, in the first execution of  SEND_DATA routine the output is correct. While for the second execution the output is incorrect. The code and the output are below. 

PROGRAM SENDRECV_REPROD
USE MPI
USE ISO_FORTRAN_ENV,ONLY: INT32
IMPLICIT NONE
INTEGER(KIND=INT32) :: STATUS(MPI_STATUS_SIZE) 
INTEGER(KIND=INT32) :: RANK,NUM_PROCS,IERR

CALL MPI_INIT(IERR)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,RANK,IERR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NUM_PROCS,IERR)

CALL SEND_DATA(RANK,NUM_PROCS)
CALL SEND_DATA(RANK,NUM_PROCS)

CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)  
CALL MPI_FINALIZE(IERR)

END PROGRAM

SUBROUTINE SEND_DATA(RANK,NUM_PROCS)
USE ISO_FORTRAN_ENV,ONLY: INT32,REAL64
USE MPI
IMPLICIT NONE
INTEGER(KIND=INT32),INTENT(IN) :: RANK
INTEGER(KIND=INT32),INTENT(IN) :: NUM_PROCS
INTEGER(KIND=INT32) :: IERR,ALLOC_ERROR
INTEGER(KIND=INT32) :: VEC_SIZE,I_RANK,RANK_DESTIN,RANK_SOURCE,TAG_SEND,TAG_RECV
REAL(KIND=REAL64), ALLOCATABLE :: COMM_BUFFER(:),VEC1(:)
INTEGER(KIND=INT32) :: MPI_COMM_STATUS(MPI_STATUS_SIZE) 



! Allocate communication arrays.

VEC_SIZE = 374454
ALLOCATE(COMM_BUFFER(VEC_SIZE),STAT=ALLOC_ERROR)
ALLOCATE(VEC1(VEC_SIZE),STAT=ALLOC_ERROR)



! Define destination and source ranks for sending and receiving messages.

RANK_DESTIN = MOD((RANK+1),NUM_PROCS)
RANK_SOURCE = MOD((RANK+NUM_PROCS-1),NUM_PROCS)

TAG_SEND = RANK+1
TAG_RECV = RANK
IF (RANK==0) TAG_RECV=NUM_PROCS

VEC1=RANK
COMM_BUFFER=0.0_REAL64
        
    
CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
        
DO I_RANK=1,NUM_PROCS
    IF (RANK==I_RANK-1) WRITE(*,*) 'R',RANK, VEC1(1),'B', COMM_BUFFER(1)
ENDDO

CALL MPI_SENDRECV(VEC1(1),VEC_SIZE,MPI_DOUBLE_PRECISION,RANK_DESTIN,TAG_SEND,COMM_BUFFER(1),&
                    VEC_SIZE,MPI_DOUBLE_PRECISION,RANK_SOURCE,TAG_RECV,MPI_COMM_WORLD,MPI_COMM_STATUS,IERR)
        
DO I_RANK=1,NUM_PROCS
    IF (RANK==I_RANK-1) WRITE(*,*) 'R' ,  RANK , VEC1(1),'A', COMM_BUFFER(1)
ENDDO



END SUBROUTINE SEND_DATA 

Output of four processes run on four machines:

 R           0  0.000000000000000E+000 B  0.000000000000000E+000
 R           1   1.00000000000000      B  0.000000000000000E+000
 R           2   2.00000000000000      B  0.000000000000000E+000
 R           3   3.00000000000000      B  0.000000000000000E+000
 R           0  0.000000000000000E+000 A   3.00000000000000     
 R           1   1.00000000000000      A  0.000000000000000E+000
 R           2   2.00000000000000      A   1.00000000000000     
 R           3   3.00000000000000      A   2.00000000000000     
 R           0  0.000000000000000E+000 B  0.000000000000000E+000
 R           1   1.00000000000000      B  0.000000000000000E+000
 R           2   2.00000000000000      B  0.000000000000000E+000
 R           3   3.00000000000000      B  0.000000000000000E+000
 R           0  0.000000000000000E+000 A   2.00000000000000     
 R           1   1.00000000000000      A   3.00000000000000     
 R           2   2.00000000000000      A  0.000000000000000E+000
 R           3   3.00000000000000      A   1.00000000000000    

 

As you see the output of first SEND_DATA execution is different from the second. The results are correct if I run the reproducer on single machine with multiple processes. I am compiling the code with:  mpiifort for the Intel(R) MPI Library 2017 Update 3 for Linux* ifort version 17.0.4

and running with mpirun version Intel(R) MPI Library for Linux* OS, Version 2017 Update 3 Build 20170405.

Do you have any idea what could be a source of this issue?

Thank you,
Piotr

0 Kudos
0 Replies
Reply