Community
cancel
Showing results for 
Search instead for 
Did you mean: 
George_C_5
Beginner
144 Views

MPI on Xeon Phi Knights Corner Card

Jump to solution

Dear all,

I am using a Xeon Phi KNC card (previous generation of Phi accelerators) and have been trying to make MPI work on the card, for a multi-accelerator application. I am issuing the following simple command in order to illustrate my problems:

[georgec@my_host ~]$ mpirun -n 4 -host mic0 hostname
[proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_sock_connect (../../utils/sock/sock.c:268): unable to connect from "my_host-mic0.microlab.ntua.gr" to "147.102.37.70" (No route to host)
[proxy:0:0@my_host-mic0.microlab.ntua.gr] main (../../pm/pmiserv/pmip.c:415): unable to connect to server 147.102.37.70 at port 43579 (check for firewalls!)

It seems that there is an issue in the bi-directional communication between host and Phi card (mic0). However, when ssh-ing into the card, there does not seem to be an issue in the link between the two machines:

[georgec@my_host-mic0 ~]$ ping host
PING host (172.31.1.254) 56(84) bytes of data.
64 bytes from host (172.31.1.254): icmp_req=1 ttl=64 time=0.557 ms
64 bytes from host (172.31.1.254): icmp_req=2 ttl=64 time=0.477 ms
64 bytes from host (172.31.1.254): icmp_req=3 ttl=64 time=0.689 ms
64 bytes from host (172.31.1.254): icmp_req=4 ttl=64 time=0.693 ms
^C
--- host ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3022ms

Using ssh from the KNC card to the host is also successful.

Notice how the host's ip is reported in a different way by the KNC card (172.31.1.254) and by the host himself (147.102.37.30). I tried changing the /etc/hosts file on the host machine so that it would use the same ip (172.31.1.254), however the issue persisted. 

My issue here is that I cannot think of a reason why the MPI link fails, whereas simple tests with pings and ssh does not show any issue. I was hoping that someone with more experience on MPI protocols and the old KNC cards may shed some light on the issue. Thank you in advance!

Relevant information:

Host OS: CentOS Linux release 7.2.1511
KNC OS: Linux version 2.6.38.8 + mpss3.7.2
MPI version: Intel(R) MPI Library for Linux* OS, Version 5.1.2 Build 20151015

0 Kudos
1 Solution
Loc_N_Intel
Employee
144 Views

Hi,

The log indicates that the firewall is active. You may want to disable the firewall in the host:

$ systemctl status firewalld

$ sudo systemctl stop firewalld

$ systemctl status firewalld

Then run your MPI program again. 

View solution in original post

5 Replies
Loc_N_Intel
Employee
145 Views

Hi,

The log indicates that the firewall is active. You may want to disable the firewall in the host:

$ systemctl status firewalld

$ sudo systemctl stop firewalld

$ systemctl status firewalld

Then run your MPI program again. 

View solution in original post

George_C_5
Beginner
144 Views

Dear Nguyen,

thank you for your reply. It actually is a firewall issue, by disabling it, I no longer this error message.
I do get this, however:

georgec@my_host.microlab.ntua.gr ~ 
╰─➤  sudo systemctl stop firewalld
georgec@my_host.microlab.ntua.gr ~  
╰─➤  mpirun -np 4 -host mic0 hostname
[proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (Permission denied)
[proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (Permission denied)
[proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (Permission denied)
[proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (Permission denied)

I thought it was strange, but I switched to root priviledges and re-ran the command:

[root@my_host]# mpirun -n 4 -host mic0 hostname
[proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (No such file or directory)
[proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (No such file or directory)
[proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (No such file or directory)
[proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (No such file or directory)

Do note that ssh-ing into the mic allows me to use hostname with no other command issued.
Thank you for your effort!

George_C_5
Beginner
144 Views

Sorry for double-posting, but I did not want to waste anyone's time.
By using environmental variables properly, I was able to run the program that I intended to run on the KNC via mpi:

mpirun -host mic0 -env LD_LIBRARY_PATH=/path/to/shared/libraries/on/mic:$LD_LIBRARY_PATH -np 4 /path/to/mic/executable arg1 arg2

As I am experimenting with proper configuration of mpirun, I will probably be able to do this procedure in a more efficient way.

There is one last issue that I would appreciate anyone's help with. Disabling the firewall on my host in order to get mpirun operational is something I would really like to avoid doing. However, I am no expert with firewall-cmd commands. I have tried following this thread's instructions: https://stackoverflow.com/questions/32703920/how-to-enable-mpi-mpirun-using-firewalld-in-centos-7

However, I have failed. The command I issue is:

sudo firewall-cmd --permanent --direct --remove-rule ipv4 filter INPUT 0 -s "my_host.microlab.ntua.gr hostip" -j ACCEPT 

but, after reloading the firewall, the issue persists. I even tried adding the same rule for the mic card.
Clearly, I am doing something wrong! Any help would be appreciated.

Thank you for your time!

Loc_N_Intel
Employee
144 Views

Hi George,

You may try with the IP address of your co-processor instead. By default the IP address of mic0 is 172.31.1.1 . So try this:

$ sudo firewall-cmd --direct --add-rule ipv4 filter INPUT 0 -s 172.31.1.1 -j ACCEPT
$ sudo firewall-cmd --direct --get-all-rules

Then run your MPI again.

After finishing your test, you can remove your rule:

$ sudo firewall-cmd --direct --remove-rule ipv4 filter INPUT 0 -s 172.31.1.1 -j ACCEPT
$ sudo firewall-cmd --direct --get-all-rules

Note: the IP address of your co-processor is shown with the command

$ sudo micctrl --config

mic0:

    Config Version: 1.1
.............................................


    Network:       Static Pair
.............................................

        MIC IP:    172.31.1.1
        Host IP:   172.31.1.254
        Net Bits:  24
        NetMask:   255.255.255.0
 
.............................................

 

Loc_N_Intel
Employee
144 Views

By the way, the direct option in firewall configuration is used as a last resort because it requires some iptables knowledge. . Instead, we can use the Rich Rule feature instead (this is simpler than the direct option):

To add a rich rule:
$ sudo firewall-cmd --add-rich-rule='rule family="ipv4" source address="172.31.1.1" accept'

To remove a rich rule:
$ sudo firewall-cmd --remove-rich-rule='rule family="ipv4" source address="172.31.1.1" accept'

Reply