Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Problem with the use of Intel MPI DAPL fabric

dingjun_chencmgl_ca
634 Views

 

 

 

Hi, on our Windows PCs cluster, when I tried to test Intel MPI benchmark 4.0 with the use of DAPL fabric, the following error always occurred. Could you tell me the reasons? Thanks in advance.

By the way, both   WinOFED 3.2 and Mellanox WinOF Rev 4.4 are installed on our Windows PCs cluster.


C:\Users\dingjun\mpi5tests>mpiexec -n 4 -env I_MPI_FABRICS shm:dapl IMB-MPI1
dapls_ib_init() NdStartup failed with NTStatus: The specified module could not be found.

dapls_ib_init() NdStartup failed with NTStatus: The specified module could not be found.

dapls_ib_init() NdStartup failed with NTStatus: The specified module could not be found.

dapls_ib_init() NdStartup failed with NTStatus: The specified module could not be found.


job aborted:
rank: node: exit code[: error message]
0: drmswc4-1.cgy.cmgl.ca: 291: process 0 exited without calling finalize
1: drmswc4-1.cgy.cmgl.ca: 291: process 1 exited without calling finalize
2: drmswc4-1.cgy.cmgl.ca: 291: process 2 exited without calling finalize
3: drmswc4-1.cgy.cmgl.ca: 291: process 3 exited without calling finalize

0 Kudos
5 Replies
James_T_Intel
Moderator
634 Views

This seems like an InfiniBand* problem.  Are you able to run any non-MPI programs using IB?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

0 Kudos
dingjun_chencmgl_ca
634 Views

Thanks to James. The InfiniBand works well except the use of DAPL fabric.

 

0 Kudos
Vijay_Amirtharaj
Beginner
634 Views

Hi,

we also faced same issue. we changed file permission of /dev/infiniband folder files in all nodes.

Like this

chmod 666 /dev/infiniband/*

then started working. please try. if works great :)

Regards,

Vijay Amirtharaj A

0 Kudos
dingjun_chencmgl_ca
634 Views

Our PCs cluster is Windows OS rather than Linux. How to hand it in this case? I look forward to your reply.

0 Kudos
James_T_Intel
Moderator
634 Views

If your HCA doesn't support DAPL* (as you stated in another thread), then you'll need to use a different HCA.

0 Kudos
Reply