Software Archive
Read-only legacy content
17061 Diskussionen

MPI fabric "dapl" works between mic0 and mic1, but not between localhost and mic0

Andrey_Vladimirov
Neuer Beitragender III
1.042Aufrufe

I am trying to execute the Intel MPI benchmark in the following configuration: CentOS 6.5 with Intel MPI version 4.1.3.045 and MPSS 3.1.2, and OFED 1.5.4.1 installed from source. My network configuration is default (static pair produced by "micctrl --initdefaults"), and I have 1 node with two 3120A Xeon Phi coprocessors.

The MPI benchmark works just fine with fabrics "tcp" or "shm:tcp". Namely, I am able to run the benchmark between localhost and mic0, and between mic0 and mic1. However, with fabric "dapl", I cannot run IMB between localhost and mic0:

[bash]
$ export I_MPI_MIC=1
$ export I_MPI_FABRICS=dapl
$ mpirun -host localhost -n 1 /opt/intel/impi/4.1.3.045/intel64/bin/IMB-MPI1 PingPong : -host mic0 -n 1 /opt/intel/impi/4.1.3.045/mic/bin/IMB-MPI1
=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
[/bash]

At the same time, fabric "dapl" works just fine (much faster than "tcp", actually) between two coprocessors mic0 and mic1:

[bash]
$ export I_MPI_MIC=1
$ export I_MPI_FABRICS=dapl
$ mpirun -host mic1 -n 1 /opt/intel/impi/4.1.3.045/mic/bin/IMB-MPI1 PingPong : -host mic0 -n 1 /opt/intel/impi/4.1.3.045/mic/bin/IMB-MPI1
 benchmarks to run PingPong
#---------------------------------------------------
#    Intel (R) MPI Benchmark Suite V3.2.4, MPI-1 part    
#---------------------------------------------------
# Date                  : Sat Jan  4 21:05:43 2014
# Machine               : k1om
# System                : Linux
# Release               : 2.6.38.8+mpss3.1.2
# Version               : #1 SMP Wed Dec 18 19:09:36 PST 2013
# MPI Version           : 2.2
# MPI Thread Environment:
... etc
[/bash]

Does IMB fail between localhost and mic0 because of the way that I am running it, or is there something that I should check in my configuration?

0 Kudos
8 Antworten
Gregg_S_Intel
Mitarbeiter
1.042Aufrufe

Your mpirun command looks fine.

Gregg_S_Intel
Mitarbeiter
1.042Aufrufe

Check your MPI setup.  This looks like and MPICH2 error message.

Similar topic:  http://software.intel.com/en-us/forums/topic/405183

Andrey_Vladimirov
Neuer Beitragender III
1.042Aufrufe

I got help from an Intel engineer in private communication. The way to diagnose this issue was setting "I_MPI_DEBUG=5". The output of that command shows that memlock limit is too small. The solution is to edit /etc/security/limits.conf, and log out / log back in. The tail of limits.conf in my current (working) configuration is this:

[bash]

*                soft    memlock         unlimited *                hard    memlock         unlimited
*                soft    core            0
*                hard    core            0

[/bash]

Another issue is setting the provider of the DAPL fabric. For the ibscif virtual adapter (between CPU and MIC), the provider can be chosen by setting

[bash]

export I_MPI_DAPL_PROVIDER=ofa-v2-scif0

[/bash]

With these two tweaks, the MPI benchmark looks happy:

[bash]

 benchmarks to run PingPong
#---------------------------------------------------
#    Intel (R) MPI Benchmark Suite V3.2.4, MPI-1 part    
#---------------------------------------------------
# Date                  : Mon Feb  3 15:24:27 2014
# Machine               : x86_64
# System                : Linux
# Release               : 2.6.32-431.el6.x86_64
# Version               : #1 SMP Fri Nov 22 03:15:09 UTC 2013
# MPI Version           : 2.2
...

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------

... [ I had to delete some lines here because they were triggering the spam filter.      This is annoying! ]

      1048576           40       229.64      4354.72
      2097152           20       379.80      5265.92
      4194304           10       683.00      5856.54

# All processes entering MPI_Finalize

[/bash]

 

P.S.: I asked the question with Intel MPI 4.1.3.045, and the last listing is from Intel MPI 4.1.1.036, however, the fix applies to both versions.

P.P.S: I have been trying to post this message for a while and getting blocked by the spam filter. It occurs a lot on this forum.

TaylorIoTKidd
Neuer Beitragender I
1.042Aufrufe

Andrey,

Thanks for your contribution to the community.

I'll find out what is happening about the SPAM filter.

Regards
--
Taylor
 

Andrey_Vladimirov
Neuer Beitragender III
1.042Aufrufe

Thanks, Taylor! From experience here and in nearby Intel forums, I feel that the spam filter dislikes multiple rows of tabulated numerical data, at least in the [bash*] listings.

DerenT_Intel
Moderator
1.042Aufrufe

Andrey, thanks for the feedback on the spam filter.  I have a meeting w/ the spam filter vendor coming up and I'll bring this issue up with them specifically.  Is it OK to have someone from our support team follow up to get more specifics about the types of messages that are being blocked?  

Andrey_Vladimirov
Neuer Beitragender III
1.042Aufrufe

Sure, thanks, I am already communicating with Hal regarding that.

DerenT_Intel
Moderator
1.042Aufrufe

excellent I'll touch base w/ him tomorrow.  Sorry about that Andrey and thanks for your patience.  Hopefully we can use this data to make this work better for everyone in the future.

Antworten