Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Miah__Wadud
Beginner
126 Views

Intel MPI process placement

Hello,

I would like to pin MPI processes across all CPU sockets. For example, I would like to run 10 MPI processes on a two socket machine with 5 MPI processes on each socket. Could you please send me the instructions on doing this?

Many thanks,

0 Kudos
4 Replies
Gergana_S_Intel
Employee
126 Views

Hi Wadud,

To pin MPI process across all sockets, you just need to set I_MPI_PIN_DOMAIN=socket.  You can do that before you run your application (export I_MPI_PIN_DOMAIN=socket) or on the mpirun command line by using the -genv option (mpirun -genv I_MPI_PIN_DOMAIN socket -n 10 ./exe).

To double-check if the pinning is correct, just set I_MPI_DEBUG=4 and Intel MPI will print a pinning table at the beginning of the run.

Let me know how this works.

Regards,
~Gergana

Miah__Wadud
Beginner
126 Views

Hi Gergana,

thanks for the reply. Does the configuration I_MPI_PIN_DOMAIN=socket do a scatter, much like how it is done in OpenMP?

Thanks.

Gergana_S_Intel
Employee
126 Views

I believe the default value is actually 'compact'.  That means the first 5 consecutive ranks will be pinned to the first socket, and the next set of 5 will be pinning to the second socket.

If you want to do 'scatter' instead (which, in your case, means alternating ranks will be pinned to different sockets), you can set I_MPI_PIN_ORDER=scatter.

Our online Intel MPI Library Reference Manual has an entire section on the pinning schema and our OpenMP interoperability: click here for link.  You're welcome to take a look.

Regards,
~Gergana

Sanjiv_T_
Beginner
126 Views

Hi ,

I have compiled espresso with intel mpi and MKL library but  getting error Failure during collective error when ever it is working fine with openmpi.

is there problem with intel mpi

Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x516f460, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x5300310, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x6b295c0, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x67183d0, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x4f794c0, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
[0:n125] unexpected disconnect completion event from [22:n122]
Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
internal ABORT - process 0
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x56bfe30, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
/var/spool/PBS/mom_priv/epilogue: line 30: kill: (5089) - No such process

Kindly help us for resolving this

Thanks
sanjiv

Reply