Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2161 Discussions

impi-4.0.0.007 run on IB network with "dapl_cma_active: ARP_ERR, retries(15) exhausted"

Terrence_Liao
Beginner
520 Views

Hello,

I am encounterd a problem that did not show up before.

nod727:16036: dapl_cma_active: ARP_ERR, retries(15) exhausted -> DST 172.40.108.10,11233

I am using these options:

-genv I_MPI_PIN 0
-genv I_MPI_FALLBACK_DEVICE 0
-genv I_MPI_RDMA_RNDV_WRITE 1
-genv I_MPI_RDMA_MAX_MSG_SIZE 4194304
-genv I_MPI_DEVICE rdssm:OpenIB-mlx4_0-1
-genv I_MPI_DEBUG +2

Please help.

Thanks.

--Terrence Liao

0 Kudos
4 Replies
Terrence_Liao
Beginner
520 Views
Hello,
I found the problem. It was due to a faulty Qlogic swith (only one year old!!) that no longer be able to initiate the connection inONE direction (i.e. nodeA -> nodeB unreachable). Once ping from the reverse, (i.e. from nodeB ping nodeA it ping O.K.), the un-initialte connection (nodeA -> nodeB) comes to live.
-- Terrence
0 Kudos
Dmitry_K_Intel2
Employee
520 Views
Hi Terrence,

Thank you for sharing your finding with people.

Could you tell me why you use I_MPI_PIN=0. Could you check performance with I_MPI_PIN set to 0 and 1?
Intel MPI Library starting from 4.0 works with I_MPI_FABRICS instead of I_MPI_DEVICE. The format is: I_MPI_FABRICS=shm:dapl. You can also use 'shm:tcp', 'shm:ofa', 'shm:tmi' if tmi is supported (Qlogic and Myrinet only). Provider can be set by I_MPI_DAPL_PROVIDER enviroment variable.

Regards!
Dmitry
0 Kudos
Terrence_Liao
Beginner
520 Views

Dmitry,

For I_MPI_PIN=0, it is for historical reason, that the code uses OpenMP too. In the early day of our developement, to make sure threads use all avail cores, we use this env to make sure no process pinning.

-- Terrebce

0 Kudos
Dmitry_K_Intel2
Employee
520 Views
Hi Terrence,

Intel MPI Library version 4.0 handles hybrid (MPI+openMP) applications much better than previous versions. You can use I_MPI_PIN and set I_MPI_PIN_DOMAIN env variable. You can find detailed description in the Reference Manual chapter 3.2 (especially 3.2.3). The idea is to place one MPI process in one domain and all other free cores will be used by openMP threads.

As an example:
$ export OMP_NUM_THREADS=4
$ export I_MPI_FABRICS=shm:dapl
$ export KMP_AFFINITY=compact

$ mpirun -perhost 4 -n ./ap_name

Please give a try and compare performance.

BTW: 4.0 Update 1 is available and shows even better performance.

Regards!
Dmitry
0 Kudos
Reply