Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

MPI (Persistent Communication)

Julio
Novice
1,250 Views

Dear community;

We have solved the Laplace equation using the standard blocking communication. My case is different since I have a huge CFD code with more than 32 arrays that need to be updated each time step. Therefore, I decided to use the persistent communication structures: MPI_SEND_INIT and MPI_RECV_INIT. These instruction are in a separated subroutine that I call at the beginning of my simulation, in order to set the communication properties.  This is a short example of one of the arrays I use.  AS you see I use Fortran MPI; more specifically mpiifort

Call MPI_SEND_INIT(u_old(2,:),1,MPI_REAL8,Left,tag,MPI_COMM_WORLD,req(1),ierr)

Call MPI_SEND_INIT(u_old(Imax-1,:),1,MPI_REAL8,Right,tag,MPI_COMM_WORLD,req(2),ierr)

Call MPI_RECV_INIT(u_old(Imax,:),1,MPI_REAL8,Right,tag,MPI_COMM_WORLD,req(3),ierr)

Call MPI_RECV_INIT(u_old(1,:),1,MPI_REAL8,Left,tag,MPI_COMM_WORLD,req(4),ierr)

 

Unfortunately, it seems that it is not working, because the value at the ghost cells are zeroes. It seems that the information is not either sent or received properly. In the main loop I call start the communication using:

call mpi_startall(4,req,ierr)
call mpi_waitall(4,req,status,ierr)


Like, I said, both instruction are in different subroutines but I compiled everything. The code runs "without" problems, but the solution is wrong because I only have  zeroes on my ghost values.

Does anyone have the Laplace solution using persistent communication ? or experience using this approach?A similar approach is followed in the textbook MPI: THe complete Reference, but it seems it does not work in my case.

Thanks before hand!!!

 

0 Kudos
5 Replies
jimdempseyatthecove
Honored Contributor III
1,250 Views

While I haven't used persistent communication (and may be entirely wrong about this), in looking at  your CALL MPI_SEND_INIT argument list I see something that may be a problem (may being stressed)

Call MPI_SEND_INIT(u_old(2,:),1,MPI_REAL8,Left,tag,MPI_COMM_WORLD,req(1),ierr)

u_old(2,:) is non-unit stride. Meaning the argument expresses a non-contiguous chunk of memory, and thus may generate a temporary containing the contents of that array section at the time of the call. Same with the other three calls listed above. If this is true then this would explain part of the symptom you describe. The other part is why your program did not crash as the I/O would be occurring after the temporary is returned (to stack or heap).

To correct for this, re-define your arrays such that the data specified is contiguous. IOW your indexes are transposed

Call MPI_SEND_INIT(u_old(:,2),1,MPI_REAL8,Left,tag,MPI_COMM_WORLD,req(1),ierr)

(requires all usage to have transposed the indexes).

Jim Dempsey

0 Kudos
Julio
Novice
1,250 Views

Thank you very much Mr.Dempsey;

I followed your advice and I did not work. In fact i agree with you. Unfortunately, they way the code is structured is in that way that disagree with FORTRAN column wise.   The work I have to do is huge, therefore I need to stick with the current format. I will try other options to see if somethig kicks in.

 

0 Kudos
Gregg_S_Intel
Employee
1,250 Views

That first argument is a pointer to a contiguous buffer.  Not clear how use of the Fortran colon syntax makes sense there.  The compiler may be turning that line into a loop and the last call in the loop points to the final value in the array.

0 Kudos
Julio
Novice
1,250 Views

Thank you very much for your comments. I followed the advice from the web. Basically I am working with a section of the data that is not contiguous. therefore, I used the MPI_TYPE_VECTOR to solve the issue but still the problem persists. I am using allocatable arrays, to adjust the size of each arrays to the needs of each processor. Still, I see zeroes on the ghost plane. In the first time step the ghost cells shows only zeroes. In the second time step, the first 5 elements have random numbers, and the rest are zeroes.

Any suggestions?

MPI_SUBROUTINE

 Call MPI_TYPE_VECTOR(Jmax,1,Imax,MPI_REAL8,GhostCells,ierr)
 Call MPI_TYPE_COMMIT(GhostCells,ierr)

....

 

Here is the main program:

 if(MyRank==0) then
 Call MPI_SEND(u_old(imax-1,1),1,GhostCells,Right,1,MPI_COMM_WORLD,ierr)
 Call MPI_RECV(u_old(imax,1),1,GhostCells,Right,2,MPI_COMM_WORLD,stat,ierr)
 else
 Call MPI_RECV(u_old(1,1),1,GhostCells,Left,1,MPI_COMM_WORLD,stat,ierr)
 Call MPI_SEND(u_old(2,1),1,GhostCells,Left,2,MPI_COMM_WORLD,ierr)
 endif

 

Thanks

0 Kudos
Gregg_S_Intel
Employee
1,250 Views

It is much faster to copy the data to a contiguous buffer.

0 Kudos
Reply