Software Archive
Read-only legacy content
17061 Discussions

MPI fabric "dapl" works between mic0 and mic1, but not between localhost and mic0

Andrey_Vladimirov
New Contributor III
1,056 Views

I am trying to execute the Intel MPI benchmark in the following configuration: CentOS 6.5 with Intel MPI version 4.1.3.045 and MPSS 3.1.2, and OFED 1.5.4.1 installed from source. My network configuration is default (static pair produced by "micctrl --initdefaults"), and I have 1 node with two 3120A Xeon Phi coprocessors.

The MPI benchmark works just fine with fabrics "tcp" or "shm:tcp". Namely, I am able to run the benchmark between localhost and mic0, and between mic0 and mic1. However, with fabric "dapl", I cannot run IMB between localhost and mic0:

[bash]
$ export I_MPI_MIC=1
$ export I_MPI_FABRICS=dapl
$ mpirun -host localhost -n 1 /opt/intel/impi/4.1.3.045/intel64/bin/IMB-MPI1 PingPong : -host mic0 -n 1 /opt/intel/impi/4.1.3.045/mic/bin/IMB-MPI1
=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
[/bash]

At the same time, fabric "dapl" works just fine (much faster than "tcp", actually) between two coprocessors mic0 and mic1:

[bash]
$ export I_MPI_MIC=1
$ export I_MPI_FABRICS=dapl
$ mpirun -host mic1 -n 1 /opt/intel/impi/4.1.3.045/mic/bin/IMB-MPI1 PingPong : -host mic0 -n 1 /opt/intel/impi/4.1.3.045/mic/bin/IMB-MPI1
 benchmarks to run PingPong
#---------------------------------------------------
#    Intel (R) MPI Benchmark Suite V3.2.4, MPI-1 part    
#---------------------------------------------------
# Date                  : Sat Jan  4 21:05:43 2014
# Machine               : k1om
# System                : Linux
# Release               : 2.6.38.8+mpss3.1.2
# Version               : #1 SMP Wed Dec 18 19:09:36 PST 2013
# MPI Version           : 2.2
# MPI Thread Environment:
... etc
[/bash]

Does IMB fail between localhost and mic0 because of the way that I am running it, or is there something that I should check in my configuration?

0 Kudos
8 Replies
Gregg_S_Intel
Employee
1,056 Views

Your mpirun command looks fine.

0 Kudos
Gregg_S_Intel
Employee
1,056 Views

Check your MPI setup.  This looks like and MPICH2 error message.

Similar topic:  http://software.intel.com/en-us/forums/topic/405183

0 Kudos
Andrey_Vladimirov
New Contributor III
1,056 Views

I got help from an Intel engineer in private communication. The way to diagnose this issue was setting "I_MPI_DEBUG=5". The output of that command shows that memlock limit is too small. The solution is to edit /etc/security/limits.conf, and log out / log back in. The tail of limits.conf in my current (working) configuration is this:

[bash]

*                soft    memlock         unlimited *                hard    memlock         unlimited
*                soft    core            0
*                hard    core            0

[/bash]

Another issue is setting the provider of the DAPL fabric. For the ibscif virtual adapter (between CPU and MIC), the provider can be chosen by setting

[bash]

export I_MPI_DAPL_PROVIDER=ofa-v2-scif0

[/bash]

With these two tweaks, the MPI benchmark looks happy:

[bash]

 benchmarks to run PingPong
#---------------------------------------------------
#    Intel (R) MPI Benchmark Suite V3.2.4, MPI-1 part    
#---------------------------------------------------
# Date                  : Mon Feb  3 15:24:27 2014
# Machine               : x86_64
# System                : Linux
# Release               : 2.6.32-431.el6.x86_64
# Version               : #1 SMP Fri Nov 22 03:15:09 UTC 2013
# MPI Version           : 2.2
...

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------

... [ I had to delete some lines here because they were triggering the spam filter.      This is annoying! ]

      1048576           40       229.64      4354.72
      2097152           20       379.80      5265.92
      4194304           10       683.00      5856.54

# All processes entering MPI_Finalize

[/bash]

 

P.S.: I asked the question with Intel MPI 4.1.3.045, and the last listing is from Intel MPI 4.1.1.036, however, the fix applies to both versions.

P.P.S: I have been trying to post this message for a while and getting blocked by the spam filter. It occurs a lot on this forum.

0 Kudos
TaylorIoTKidd
New Contributor I
1,056 Views

Andrey,

Thanks for your contribution to the community.

I'll find out what is happening about the SPAM filter.

Regards
--
Taylor
 

0 Kudos
Andrey_Vladimirov
New Contributor III
1,056 Views

Thanks, Taylor! From experience here and in nearby Intel forums, I feel that the spam filter dislikes multiple rows of tabulated numerical data, at least in the [bash*] listings.

0 Kudos
DerenT_Intel
Moderator
1,056 Views

Andrey, thanks for the feedback on the spam filter.  I have a meeting w/ the spam filter vendor coming up and I'll bring this issue up with them specifically.  Is it OK to have someone from our support team follow up to get more specifics about the types of messages that are being blocked?  

0 Kudos
Andrey_Vladimirov
New Contributor III
1,056 Views

Sure, thanks, I am already communicating with Hal regarding that.

0 Kudos
DerenT_Intel
Moderator
1,056 Views

excellent I'll touch base w/ him tomorrow.  Sorry about that Andrey and thanks for your patience.  Hopefully we can use this data to make this work better for everyone in the future.

0 Kudos
Reply