- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear All,
I am working on a cluster with several MICs attached to it. The co-processors are distributed in four HP Proliant SL250s Gen8 computing nodes, with 2x Intel Xeon E-2660 and 3x Intel Xeon Phi 5110P MICs each, for a total of 12 co-processors on the entire cluster. The workload of the cluster is controlled using the SLURM Workload Manager and the manager was also compiled within the MICs, and therefore they can be treated as independent computing nodes.
Well, I have been able to execute an hybrid MPI/OpenMP code with success both in a single MIC and in a group of MICs inside the same node. For example, the following script is used to execute my code across three MICs inside the same computing node (cnf001):
#!/bin/bash #SBATCH -J omp_tutor7_mpi-MIC #SBATCH -p mics #SBATCH -N 3 #SBATCH -w cnf001-mic[0-2] #SBATCH -o omp_tutor7_mpi-MIC-%j.out #SBATCH -e omp_tutor7_mpi-MIC-%j.err export PATH=/home/apps/intel/2016/impi/5.1.2.150/mic/bin/:$PATH export LD_LIBRARY_PATH=/home/apps/intel/2016/impi/5.1.2.150/mic/lib/:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/home/apps/intel/2016/lib/mic/:$LD_LIBRARY_PATH export I_MPI_FABRICS=shm:tcp export KMP_PLACE_THREADS=60c,4t export KMP_AFFINITY=scatter mpiexec.hydra -n 3 ./omp_tutor7_mpi -i omp_tutor7_mpi -p 521icru -o omp_tutor7_mpi -b
In this case, using MPI the program is distributed across the three MICs, and then multi-threading within each MIC is enabled using OpenMP. The problem arises when I try to use MICs that are located inside different nodes (cnf001 and cnf002), for example, using the following script:
#!/bin/bash #SBATCH -J omp_tutor7_mpi-MIC #SBATCH -p mics #SBATCH -N 2 #SBATCH -w cnf001-mic0,cnf002-mic0 #SBATCH -o omp_tutor7_mpi-MIC-%j.out #SBATCH -e omp_tutor7_mpi-MIC-%j.err export PATH=/home/apps/intel/2016/impi/5.1.2.150/mic/bin/:$PATH export LD_LIBRARY_PATH=/home/apps/intel/2016/impi/5.1.2.150/mic/lib/:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/home/apps/intel/2016/lib/mic/:$LD_LIBRARY_PATH export I_MPI_FABRICS=shm:tcp export KMP_PLACE_THREADS=60c,4t export KMP_AFFINITY=scatter mpiexec.hydra -n 2 ./omp_tutor7_mpi -i omp_tutor7_mpi -p 521icru -o omp_tutor7_mpi -b
In this case I obtain no output from the MICs. The workload manager shows that the co-processors are running, however the execution does not end and I obtain not output neither from my code nor communication errors between the MICs, therefore I suppose that the co-processors are "hanged"and they are not executing my program. I have tried with different values for the I_MPI_DEBUG variable but again I do not obtain any output from the execution. The only "success" that I have obtained so far was using the following command to execute using MPI:
mpiexec.hydra -n 2 -hosts cnf001-mic0,cnf002-mic0 ./omp_tutor7_mpi -i omp_tutor7_mpi -p 521icru -o omp_tutor7_mpi -b
However, in that case the code really is executed in the first listed MIC (cnf001-mic0) and the second is simply ignored. About the communication, I am able to ssh between host and all MICs, and between MICs of both the same computing node and across nodes, therefore it does not seems to be an obvious communication problem. I would like to kindly ask any hint of where should I look to solve this problem. I am quite new to the computing world using MICs and I am very lost with this issue. Thanks for your help!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Edgardo,
Could you please provide Hydra debug information for the problematic scenarios (it's enabled by 'export I_MPI_HYDRA_DEBUG=1')?
Regarding to this problem:
However, in that case the code really is executed in the first listed MIC (cnf001-mic0) and the second is simply ignored.
Could you please try to run this scenario with 'export I_MPI_PERHOST=1'?
Also try to simplify the scenario to something like this and provide its output:
mpirun -v -ppn 1 -n 2 -hosts node-mic0,node-mic1 hostname
mpirun -v -ppn 1 -n 2 -hosts node-mic0,node-mic1 IMB-MPI1 pingpong
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Artem,
for the first case, using I_MPI_HYDRA_DEBUG=1, i.e. using the following command in the script:
mpiexec.hydra -n 2 ./omp_tutor7_mpi -i omp_tutor7_mpi -p 521icru -o omp_tutor7_mpi -b
I attached the file I_MPI_HYDRA_DEBUG_1.txt for the output. It is quite large and it seems that it is able to recognise both MICs, but at some moment the execution gets stuck when the second MIC is called up.
For the second case (using the -hosts option), with I_MPI_PERHOST=1 I obtain again no output from the MICs and I have to manually cancel the job. If I add the I_MPI_HYDRA_DEBUG=1 environment variable I obtain the output stored in the attached file I_MPI_PERHOST_1.txt. It looks quite the same as the output from the first case.
Finally, when I tried to execute the commands with mpirun I obtained the following error message:
[edoerner@leftraru2 ~]$ export I_MPI_MIC=1 [edoerner@leftraru2 ~]$ mpirun -v -ppn 1 -n 2 -hosts cnf001-mic0,cnf002-mic0 hostname ... mpiexec@leftraru2] STDIN will be redirected to 1 fd(s): 9 [proxy:0:0@cnf001-mic0] Start PMI_proxy 0 [proxy:0:0@cnf001-mic0] STDIN will be redirected to 1 fd(s): 9 [proxy:0:0@cnf001-mic0] HYDU_create_process (../../utils/launch/launch.c:588): execvp error on file hostname (No such file or directory) [proxy:0:1@cnf002-mic0] Start PMI_proxy 1 [proxy:0:1@cnf002-mic0] HYDU_create_process (../../utils/launch/launch.c:588): execvp error on file hostname (No such file or directory)
Thanks for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Edgardo,
Thanks for the information. Could you please make sure that the following SSH paths work fine and non-interactive:
cnf001-mic0 -> cnf002-mic0
cnf002-mic0 -> cnf001-mic0
cnf002-mic0 -> cnf001-mic0.nlhpc.cl
Also check the IP forwarding settings, according to the Intel® MPI Library for Linux* OS User's Guide / chapter "Using the Intel® MPI Library with the Intel® Many Integrated Core (Intel® MIC) Architecture":
12.2. Multiple Cards
To use multiple cards for a single job, the Intel® Manycore Platform Software Stack (Intel® MPSS) needs to be configured for peer-to-peer support (see the Intel® MPSS documentation for details) and the host(s) needs to have IP forwarding enabled.
(host)$ sudo sysctl -w net.ipv4.ip_forward=1
Each host/card should be able to ping every other host/card and the launching host should be able to connect to every target, as with a classic cluster.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Artem,
I have tested the SSH-paths with the following script and I did not found any issues.
#!/bin/bash # Test script to ssh into MICs ssh cnf002-mic0 "hostname && ssh cnf001-mic0.nlhpc.cl hostname"
I also tested for the IP forwarding settings and it seems that it is not enabled. I obtained the following
[edoerner@leftraru1 ~]$ sysctl -n net.ipv4.ip_forward 0
I will ask the admin about this setting (I do not have sudo privileges). Thanks for your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, we changed the IP forwarding settings and the problem persists. So we are still stuck on this...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Edgardo,
Could you please try to run the following test scenario and provide its output:
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
mpiexec.hydra -v -ppn 1 -n 2 -hosts node-mic0,node-mic1 IMB-MPI1 pingpong -msglog 0:1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I just curious to see if the problem comes from the second node. If you change the roles of cnf001 and chf002, do you still see the problem? Instead of doing:
# mpiexec.hydra -n 2 -hosts cnf001-mic0,cnf002-mic0 ./omp_tutor7_mpi -i omp_tutor7_mpi -p 521icru -o omp_tutor7_mpi -b
Can you test
# mpiexec.hydra -n 2 -hosts cnf002-mic0,cnf001-mic0 ./omp_tutor7_mpi -i omp_tutor7_mpi -p 521icru -o omp_tutor7_mpi -b
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Edgardo,
Thanks for the information. I didn't see anything suspicious in the provided log file. As far as I see there're some problems with TCP connection from cnf002-mic0 to cnf001-mic0.nlhpc.cl (port: 51440). You said that "cnf002-mic0 -> cnf001-mic0.nlhpc.cl" works fine over SSH, so potentially there're some firewall limitations. Could you please check the firewall status on cnf001-mic0/cnf002-mic0 (administrator's permissions may be required for this)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Loc,
I tested changing the order of the MICs and I also have issues, I attached the log file. Reading it it seems quite the same as the original case, but now the roles are "interchanged" between the co-processors.
@Artem: I will ask the administrator to look at the Firewall settings, thanks for your time!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As stated by the Administrator, the MICs have not iptables installed and in the nodes containing the MICs all the settings are in accept:
[root@cnf001 ~]# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Edgardo,
Could you please double check that:
1. IP forwarding is enabled on both cnf001/cnf002 hosts
2. Firewall is disabled on both cnf001/cnf002 hosts
By cnf001/cnf002 I mean HOST side of the nodes where cnf001-mic0/cnf002-mic0 MIC cards are placed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the admin says that all the nodes have forwarding enabled and the firewall inactive. For example,
[root@master1 ~]# pdsh -w cnf00[1-4] sysctl net.ipv4.ip_forward cnf002: net.ipv4.ip_forward = 1 cnf004: net.ipv4.ip_forward = 1 cnf003: net.ipv4.ip_forward = 1 cnf001: net.ipv4.ip_forward = 1 [root@master1 ~]# pdsh -w cnf00[1-4] systemctl status firewalld cnf002: ● firewalld.service - firewalld - dynamic firewall daemon cnf002: Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) cnf002: Active: inactive (dead) pdsh@master1: cnf002: ssh exited with exit code 3 cnf004: ● firewalld.service - firewalld - dynamic firewall daemon cnf004: Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) cnf004: Active: inactive (dead) pdsh@master1: cnf004: ssh exited with exit code 3 cnf003: ● firewalld.service - firewalld - dynamic firewall daemon cnf003: Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) cnf003: Active: inactive (dead) pdsh@master1: cnf003: ssh exited with exit code 3 cnf001: ● firewalld.service - firewalld - dynamic firewall daemon cnf001: Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) cnf001: Active: inactive (dead)
Thanks for your time!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Edgardo,
Could you please try the following scenarios and provide the corresponding output?
Run on node1-mic0:
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
mpiexec.hydra -v -ppn 1 -n 2 -hosts node2-mic0 IMB-MPI1 pingpong -msglog 0:1
Run on node1-mic0:
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
export I_MPI_HYDRA_BOOTSTRAP=slurm
mpiexec.hydra -v -ppn 1 -n 2 -hosts node1-mic0,node2-mic0 IMB-MPI1 pingpong -msglog 0:1
Run on node1-mic0:
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
mpiexec.hydra -v -localhost <node1_mic0_ip_address> -ppn 1 -n 2 -hosts node1-mic0,node2-mic0 IMB-MPI1 pingpong -msglog 0:1
Run on node1-mic0:
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
mpiexec.hydra -v -localhost <node1_mic0_ip_address> -ppn 1 -n 2 -hosts <node1_mic0_ip_address>,<node2_mic0_ip_address> IMB-MPI1 pingpong -msglog 0:1
Run on node1:
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
mpiexec.hydra -v -ppn 1 -n 2 -hosts node1-mic0,node2-mic0 IMB-MPI1 pingpong -msglog 0:1
Could you please also specify OS and MPSS version for the nodes?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Some additions for the last scenario:
Run on node1:
. <impi_install_path>/intel64/bin/mpivars.sh
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
export I_MPI_MIC=1
export I_MPI_MIC_PREFIX=$I_MPI_ROOT/mic/bin/
mpiexec.hydra -v -ppn 1 -n 2 -hosts node1-mic0,node2-mic0 IMB-MPI1 pingpong -msglog 0:1
For all the scenarios check that SSH is password-less before the run.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Artem,
I am sorry for the delay, I have ran the scenarios for the MICs, here comes the list:
I got a little bit confused about the last two scenarios. Must I run them from the CPU host?. Thanks for your time!.
- s1-MIC.txt :
Run on node1-mic0:
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
mpiexec.hydra -v -ppn 1 -n 2 -hosts node2-mic0 IMB-MPI1 pingpong -msglog 0:1 - s2-MIC.txt:
Run on node1-mic0:
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
export I_MPI_HYDRA_BOOTSTRAP=slurm
mpiexec.hydra -v -ppn 1 -n 2 -hosts node1-mic0,node2-mic0 IMB-MPI1 pingpong -msglog 0:1 - s3-MIC.txt:
Run on node1-mic0:
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
mpiexec.hydra -v -localhost <node1_mic0_ip_address> -ppn 1 -n 2 -hosts node1-mic0,node2-mic0 IMB-MPI1 pingpong -msglog 0:1 -
s4-MIC.txt: Run on node1-mic0:
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
mpiexec.hydra -v -localhost <node1_mic0_ip_address> -ppn 1 -n 2 -hosts <node1_mic0_ip_address>,<node2_mic0_ip_address> IMB-MPI1 pingpong -msglog 0:1
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page