- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to explore one-side-communication using Intel MPI (version 5.0.3.048, ifort version 15.0.2 20150121).
I have a cluster of 4 nodes (8 cores/node) and on each node only one rank generate a big array. Now i have to copy this array on each rank's node.
Using MPI_BCAST i use this code:
PROGRAM MAPS USE MPI IMPLICIT NONE INTEGER, PARAMETER :: n=100000000 INTEGER, PARAMETER :: cores=8 REAL, DIMENSION(n) :: B INTEGER :: ierr,world_rank,world_size,i,j INTEGER :: rank2,comm2,size2 CALL MPI_INIT(ierr) CALL MPI_COMM_RANK(MPI_COMM_WORLD,world_rank,ierr) CALL MPI_COMM_SIZE(MPI_COMM_WORLD,world_size,ierr) CALL MPI_COMM_SPLIT_TYPE(MPI_COMM_WORLD,MPI_COMM_TYPE_SHARED,0, & & MPI_INFO_NULL,comm2,ierr) CALL MPI_COMM_RANK(comm2,rank2,ierr) CALL MPI_COMM_SIZE(comm2,size2,ierr) DO j=1,100 IF(rank2 == 0)THEN DO i =1,n B(i)=FLOAT(i)*FLOAT(j) END DO END IF CALL MPI_BCAST(B,n,MPI_REAL,0,comm2,ierr) END DO CALL MPI_FINALIZE(ierr) END PROGRAM MAPS
While using one-side-communication i use for example this code:
PROGRAM MAPS USE MPI IMPLICIT NONE INTEGER, PARAMETER :: n=100000000 INTEGER, PARAMETER :: cores=8 REAL, DIMENSION(n) :: B INTEGER :: disp_int,win,ierr,world_rank,world_size,i,j,k INTEGER :: memory_model,rank2,comm2,size2 LOGICAL ::flag INTEGER (KIND=MPI_ADDRESS_KIND) :: lowerbound,size,realextent INTEGER (KIND=MPI_ADDRESS_KIND) ::disp_aint CALL MPI_INIT(ierr) CALL MPI_COMM_RANK(MPI_COMM_WORLD,world_rank,ierr) CALL MPI_COMM_SIZE(MPI_COMM_WORLD,world_size,ierr) CALL MPI_COMM_SPLIT_TYPE(MPI_COMM_WORLD,MPI_COMM_TYPE_SHARED,0, & & MPI_INFO_NULL,comm2,ierr) CALL MPI_COMM_RANK(comm2,rank2,ierr) CALL MPI_COMM_SIZE(comm2,size2,ierr) CALL MPI_TYPE_GET_EXTENT(MPI_REAL,lowerbound,realextent,ierr) disp_int=realextent size=n*realextent CALL MPI_WIN_CREATE(B,size,disp_int,MPI_INFO_NULL,comm2,win,ierr) CALL MPI_WIN_GET_ATTR(win,MPI_WIN_MODEL,memory_model,flag,ierr) disp_aint=0 DO k=1,100 IF(rank2 == 0)THEN DO i =1,n B(i)=FLOAT(i)*FLOAT(k) END DO END IF CALL MPI_WIN_FENCE(0,win,ierr) IF(rank2 /= 0)THEN CALL MPI_GET(B,n,MPI_REAL,0,disp_aint,n,MPI_REAL,win,ierr) END IF CALL MPI_WIN_FENCE(0,win,ierr) END DO CALL MPI_WIN_FREE(win,ierr) CALL MPI_FINALIZE(ierr) END PROGRAM MAPS
I tried also using MPI_WIN_POST/START/COMPLETE/WAIT and MPI_LOCK but with same performances.
I compile both of them in this way:
mpiifort -O0 -g -debug inline-debug-info bcast.f90
mpiifort -O0 -g -debug inline-debug-info win-fence2.f90
I launch both of them in this way:
mpiexec.hydra -f ./mpd.hosts -print-rank-map -ppn 8 -n 32 -env I_MPI_FABRICS shm:tcp a.out
where mpd.hosts contains my 4 nodes. I repeat the same operation 100 times only to obtain an elapsed time big enough.
I noticed that MPI_BCAST version is faster than MPI_GET version while i was trying to speed-up this operation of copy using new RMA features.
Is there some naive error in my MPI_GET version? Is it correct to aspect a better performance using MPI_BCAST to address a problem like this?
Any suggestions or comments would be very helpful.
Thanks in advance
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
MPI_BCAST in Intel(R) MPI Library is highly optimized for Intel platforms by Intel MPI Library engineering team.
So if you'll get version that faster than MPI_BCAST with the same result, please let us know.
--
Dmitry Sivkov
Intel(R) Cluster Tools TCE
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page