Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

MPI DAPL error in Intel Xeon Phi

Wei_W_2
Beginner
368 Views

I am trying to use MPI in multiple MICs, and I get the DAPL error, The following is the info that I enable the MPI_DEBUG=5

wwu12:lips ~/work/mic/mpitest> mpirun -genv I_MPI_DAPL_PROVIDER_LIST=ofa-v2-scif0 -env I_MPI_DEBUG=5 -env I_MPI_MIC=enable -hostfile mpi_host -perhost 1 -n 2 /tmp/test.mic
Max MV2_DEFAULT_MAX_SG_LIST is 0, set to 1
Max MV2_SRQ_SIZE is 0, set to 512
Max MV2_DEFAULT_MAX_SG_LIST is 0, set to 1
Max MV2_SRQ_SIZE is 0, set to 512
[0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-scif0
[1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-scif0
[1] MPI startup(): DAPL provider ofa-v2-scif0
[0] MPI startup(): DAPL provider ofa-v2-scif0
[1] MPI startup(): dapl data transfer mode
[0] MPI startup(): dapl data transfer mode
[0:mic0] unexpected DAPL event 0x4003
Assertion failed in file ../../dapl_init_rc.c at line 1337: 0

There is no error when I run the program in a single MIC or in host and 1 MIC. Anyone know where the problem is?

0 Kudos
2 Replies
James_T_Intel
Moderator
368 Views

Hi Wei,

Let's check a few basics.  Make certain that each coprocessor has a unique name and IP address on the network.  Ensure that you can connect, via SSH, from one coprocessor to another.  What version of MPSS are you using, and is it the same on every coprocessor?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

0 Kudos
Wei_W_2
Beginner
368 Views

James Tullos (Intel) wrote:

Hi Wei,

Let's check a few basics.  Make certain that each coprocessor has a unique name and IP address on the network.  Ensure that you can connect, via SSH, from one coprocessor to another.  What version of MPSS are you using, and is it the same on every coprocessor?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Thanks very much. I figured out I am not able to ssh from one coprocessor to another. I will contact my machine administrator to report this problem. 

0 Kudos
Reply