- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to explore one-side-communication using Intel MPI (version 5.0.3.048, ifort version 15.0.2 20150121).
I have a cluster of 4 nodes (8 cores/node) and on each node only one rank generate a big array. Now i have to copy this array on each rank's node.
Using MPI_BCAST i use this code:
PROGRAM MAPS
USE MPI
IMPLICIT NONE
INTEGER, PARAMETER :: n=100000000
INTEGER, PARAMETER :: cores=8
REAL, DIMENSION(n) :: B
INTEGER :: ierr,world_rank,world_size,i,j
INTEGER :: rank2,comm2,size2
CALL MPI_INIT(ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,world_rank,ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,world_size,ierr)
CALL MPI_COMM_SPLIT_TYPE(MPI_COMM_WORLD,MPI_COMM_TYPE_SHARED,0, &
& MPI_INFO_NULL,comm2,ierr)
CALL MPI_COMM_RANK(comm2,rank2,ierr)
CALL MPI_COMM_SIZE(comm2,size2,ierr)
DO j=1,100
IF(rank2 == 0)THEN
DO i =1,n
B(i)=FLOAT(i)*FLOAT(j)
END DO
END IF
CALL MPI_BCAST(B,n,MPI_REAL,0,comm2,ierr)
END DO
CALL MPI_FINALIZE(ierr)
END PROGRAM MAPS
While using one-side-communication i use for example this code:
PROGRAM MAPS
USE MPI
IMPLICIT NONE
INTEGER, PARAMETER :: n=100000000
INTEGER, PARAMETER :: cores=8
REAL, DIMENSION(n) :: B
INTEGER :: disp_int,win,ierr,world_rank,world_size,i,j,k
INTEGER :: memory_model,rank2,comm2,size2
LOGICAL ::flag
INTEGER (KIND=MPI_ADDRESS_KIND) :: lowerbound,size,realextent
INTEGER (KIND=MPI_ADDRESS_KIND) ::disp_aint
CALL MPI_INIT(ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,world_rank,ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,world_size,ierr)
CALL MPI_COMM_SPLIT_TYPE(MPI_COMM_WORLD,MPI_COMM_TYPE_SHARED,0, &
& MPI_INFO_NULL,comm2,ierr)
CALL MPI_COMM_RANK(comm2,rank2,ierr)
CALL MPI_COMM_SIZE(comm2,size2,ierr)
CALL MPI_TYPE_GET_EXTENT(MPI_REAL,lowerbound,realextent,ierr)
disp_int=realextent
size=n*realextent
CALL MPI_WIN_CREATE(B,size,disp_int,MPI_INFO_NULL,comm2,win,ierr)
CALL MPI_WIN_GET_ATTR(win,MPI_WIN_MODEL,memory_model,flag,ierr)
disp_aint=0
DO k=1,100
IF(rank2 == 0)THEN
DO i =1,n
B(i)=FLOAT(i)*FLOAT(k)
END DO
END IF
CALL MPI_WIN_FENCE(0,win,ierr)
IF(rank2 /= 0)THEN
CALL MPI_GET(B,n,MPI_REAL,0,disp_aint,n,MPI_REAL,win,ierr)
END IF
CALL MPI_WIN_FENCE(0,win,ierr)
END DO
CALL MPI_WIN_FREE(win,ierr)
CALL MPI_FINALIZE(ierr)
END PROGRAM MAPS
I tried also using MPI_WIN_POST/START/COMPLETE/WAIT and MPI_LOCK but with same performances.
I compile both of them in this way:
mpiifort -O0 -g -debug inline-debug-info bcast.f90
mpiifort -O0 -g -debug inline-debug-info win-fence2.f90
I launch both of them in this way:
mpiexec.hydra -f ./mpd.hosts -print-rank-map -ppn 8 -n 32 -env I_MPI_FABRICS shm:tcp a.out
where mpd.hosts contains my 4 nodes. I repeat the same operation 100 times only to obtain an elapsed time big enough.
I noticed that MPI_BCAST version is faster than MPI_GET version while i was trying to speed-up this operation of copy using new RMA features.
Is there some naive error in my MPI_GET version? Is it correct to aspect a better performance using MPI_BCAST to address a problem like this?
Any suggestions or comments would be very helpful.
Thanks in advance
Link Copied
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page