- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I am using a Xeon Phi KNC card (previous generation of Phi accelerators) and have been trying to make MPI work on the card, for a multi-accelerator application. I am issuing the following simple command in order to illustrate my problems:
[georgec@my_host ~]$ mpirun -n 4 -host mic0 hostname [proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_sock_connect (../../utils/sock/sock.c:268): unable to connect from "my_host-mic0.microlab.ntua.gr" to "147.102.37.70" (No route to host) [proxy:0:0@my_host-mic0.microlab.ntua.gr] main (../../pm/pmiserv/pmip.c:415): unable to connect to server 147.102.37.70 at port 43579 (check for firewalls!)
It seems that there is an issue in the bi-directional communication between host and Phi card (mic0). However, when ssh-ing into the card, there does not seem to be an issue in the link between the two machines:
[georgec@my_host-mic0 ~]$ ping host PING host (172.31.1.254) 56(84) bytes of data. 64 bytes from host (172.31.1.254): icmp_req=1 ttl=64 time=0.557 ms 64 bytes from host (172.31.1.254): icmp_req=2 ttl=64 time=0.477 ms 64 bytes from host (172.31.1.254): icmp_req=3 ttl=64 time=0.689 ms 64 bytes from host (172.31.1.254): icmp_req=4 ttl=64 time=0.693 ms ^C --- host ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3022ms
Using ssh from the KNC card to the host is also successful.
Notice how the host's ip is reported in a different way by the KNC card (172.31.1.254) and by the host himself (147.102.37.30). I tried changing the /etc/hosts file on the host machine so that it would use the same ip (172.31.1.254), however the issue persisted.
My issue here is that I cannot think of a reason why the MPI link fails, whereas simple tests with pings and ssh does not show any issue. I was hoping that someone with more experience on MPI protocols and the old KNC cards may shed some light on the issue. Thank you in advance!
Relevant information:
Host OS: CentOS Linux release 7.2.1511
KNC OS: Linux version 2.6.38.8 + mpss3.7.2
MPI version: Intel(R) MPI Library for Linux* OS, Version 5.1.2 Build 20151015
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The log indicates that the firewall is active. You may want to disable the firewall in the host:
$ systemctl status firewalld
$ sudo systemctl stop firewalld
$ systemctl status firewalld
Then run your MPI program again.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The log indicates that the firewall is active. You may want to disable the firewall in the host:
$ systemctl status firewalld
$ sudo systemctl stop firewalld
$ systemctl status firewalld
Then run your MPI program again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Nguyen,
thank you for your reply. It actually is a firewall issue, by disabling it, I no longer this error message.
I do get this, however:
georgec@my_host.microlab.ntua.gr ~ ╰─➤ sudo systemctl stop firewalld georgec@my_host.microlab.ntua.gr ~ ╰─➤ mpirun -np 4 -host mic0 hostname [proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (Permission denied) [proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (Permission denied) [proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (Permission denied) [proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (Permission denied)
I thought it was strange, but I switched to root priviledges and re-ran the command:
[root@my_host]# mpirun -n 4 -host mic0 hostname [proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (No such file or directory) [proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (No such file or directory) [proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (No such file or directory) [proxy:0:0@my_host-mic0.microlab.ntua.gr] HYDU_create_process (../../utils/launch/launch.c:621): execvp error on file hostname (No such file or directory)
Do note that ssh-ing into the mic allows me to use hostname with no other command issued.
Thank you for your effort!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for double-posting, but I did not want to waste anyone's time.
By using environmental variables properly, I was able to run the program that I intended to run on the KNC via mpi:
mpirun -host mic0 -env LD_LIBRARY_PATH=/path/to/shared/libraries/on/mic:$LD_LIBRARY_PATH -np 4 /path/to/mic/executable arg1 arg2
As I am experimenting with proper configuration of mpirun, I will probably be able to do this procedure in a more efficient way.
There is one last issue that I would appreciate anyone's help with. Disabling the firewall on my host in order to get mpirun operational is something I would really like to avoid doing. However, I am no expert with firewall-cmd commands. I have tried following this thread's instructions: https://stackoverflow.com/questions/32703920/how-to-enable-mpi-mpirun-using-firewalld-in-centos-7
However, I have failed. The command I issue is:
sudo firewall-cmd --permanent --direct --remove-rule ipv4 filter INPUT 0 -s "my_host.microlab.ntua.gr hostip" -j ACCEPT
but, after reloading the firewall, the issue persists. I even tried adding the same rule for the mic card.
Clearly, I am doing something wrong! Any help would be appreciated.
Thank you for your time!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi George,
You may try with the IP address of your co-processor instead. By default the IP address of mic0 is 172.31.1.1 . So try this:
$ sudo firewall-cmd --direct --add-rule ipv4 filter INPUT 0 -s 172.31.1.1 -j ACCEPT
$ sudo firewall-cmd --direct --get-all-rules
Then run your MPI again.
After finishing your test, you can remove your rule:
$ sudo firewall-cmd --direct --remove-rule ipv4 filter INPUT 0 -s 172.31.1.1 -j ACCEPT
$ sudo firewall-cmd --direct --get-all-rules
Note: the IP address of your co-processor is shown with the command
$ sudo micctrl --config
mic0:
Config Version: 1.1
.............................................
Network: Static Pair
.............................................
MIC IP: 172.31.1.1
Host IP: 172.31.1.254
Net Bits: 24
NetMask: 255.255.255.0
.............................................
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By the way, the direct option in firewall configuration is used as a last resort because it requires some iptables knowledge. . Instead, we can use the Rich Rule feature instead (this is simpler than the direct option):
To add a rich rule:
$ sudo firewall-cmd --add-rich-rule='rule family="ipv4" source address="172.31.1.1" accept'
To remove a rich rule:
$ sudo firewall-cmd --remove-rich-rule='rule family="ipv4" source address="172.31.1.1" accept'
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page