Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
17060 Discussions

Cannot start MPI job from host system [me too, but SOLVED]

Anders_H_
Beginner
1,437 Views

Hi, 

I wasn't sure whether or not to add a new post since I seem to have exactly the same problem as what's explained here: https://software.intel.com/en-us/forums/intel-many-integrated-core/topic/599473 , but I didn't solve the problem.

I am running RHEL6.5, OFED 3.18.1 with 4 Xeon Phi cards in a external static bridge. The IP addresses are:
Host: 192.168.1.16
mic0: 192.168.1.17
mic1: 192.168.1.18
mic2: 192.168.1.19
mic3: 192.168.1.20

and the mic's can connect with each other and the host with ssh keys. The host has no firewall running. openibd and ofed-mic are running. I have tried Intel cluster 2015 update 3, 2015 update 5 and 2016 update 1 - all with the same problem. When I try to run anything from the host on the coprocessor:

[on host]:
export I_MPI_MIC=1
export I_MPI_DYNAMIC_CONNECTION=1
source /opt/intel/impi_latest/intel64/bin/mpivars.sh
mpiexec.hydra -n 1 -hosts mic0 $I_MPI_ROOT/mic/bin/IMB-MPI1

the result is:
[proxy:0:0@kif-mic0] HYDU_sock_connect (../../utils/sock/sock.c:268): unable to connect from "kif-mic0" to "127.0.0.1" (Connection refused)
[proxy:0:0@kif-mic0] main (../../pm/pmiserv/pmip.c:415): unable to connect to server 127.0.0.1 at port 35690 (check for firewalls!)

Attached is the verbose output from mpiexec.hydra (log.txt). It seems like several people has this issue, but I haven't found a solution that worked for me yet. What do I do here? 

0 Kudos
1 Reply
Anders_H_
Beginner
1,437 Views

Update: 

It works if I specify -iface br0 (see ifconfig.txt for current ifconfig). So the command

mpirun -n 1 -iface br0 -hosts mic0 $I_MPI_ROOT/mic/bin/IMB-MPI1

works as expected. It also worked running on all cards. 

TL;DR: I needed to specify -iface br0 in the mpirun command. 

Edit: Since I use virtual infiniband, I switched to -iface virbr0.

0 Kudos
Reply