- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to use MPI in multiple MICs, and I get the DAPL error, The following is the info that I enable the MPI_DEBUG=5
wwu12:lips ~/work/mic/mpitest> mpirun -genv I_MPI_DAPL_PROVIDER_LIST=ofa-v2-scif0 -env I_MPI_DEBUG=5 -env I_MPI_MIC=enable -hostfile mpi_host -perhost 1 -n 2 /tmp/test.mic
Max MV2_DEFAULT_MAX_SG_LIST is 0, set to 1
Max MV2_SRQ_SIZE is 0, set to 512
Max MV2_DEFAULT_MAX_SG_LIST is 0, set to 1
Max MV2_SRQ_SIZE is 0, set to 512
[0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-scif0
[1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-scif0
[1] MPI startup(): DAPL provider ofa-v2-scif0
[0] MPI startup(): DAPL provider ofa-v2-scif0
[1] MPI startup(): dapl data transfer mode
[0] MPI startup(): dapl data transfer mode
[0:mic0] unexpected DAPL event 0x4003
Assertion failed in file ../../dapl_init_rc.c at line 1337: 0
There is no error when I run the program in a single MIC or in host and 1 MIC. Anyone know where the problem is?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Wei,
Let's check a few basics. Make certain that each coprocessor has a unique name and IP address on the network. Ensure that you can connect, via SSH, from one coprocessor to another. What version of MPSS are you using, and is it the same on every coprocessor?
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
James Tullos (Intel) wrote:
Hi Wei,
Let's check a few basics. Make certain that each coprocessor has a unique name and IP address on the network. Ensure that you can connect, via SSH, from one coprocessor to another. What version of MPSS are you using, and is it the same on every coprocessor?
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
Thanks very much. I figured out I am not able to ssh from one coprocessor to another. I will contact my machine administrator to report this problem.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page