Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
23 Views

hybrid MPI/OpenMP with MICs - I cannot execute across MICs inside different nodes

Dear All,

I am working on a cluster with several MICs attached to it. The co-processors are distributed in four HP Proliant SL250s Gen8 computing nodes, with 2x Intel Xeon E-2660 and 3x Intel Xeon Phi 5110P MICs each, for a total of 12 co-processors on the entire cluster. The workload of the cluster is controlled using the SLURM Workload Manager and the manager was also compiled within the MICs, and therefore they can be treated as independent computing nodes.

Well, I have been able to execute an hybrid MPI/OpenMP code with success both in a single MIC and in a group of MICs inside the same node. For example, the following script is used to execute my code across three MICs inside the same computing node (cnf001):

#!/bin/bash
#SBATCH -J omp_tutor7_mpi-MIC
#SBATCH -p mics
#SBATCH -N 3
#SBATCH -w cnf001-mic[0-2] 
#SBATCH -o omp_tutor7_mpi-MIC-%j.out
#SBATCH -e omp_tutor7_mpi-MIC-%j.err

export PATH=/home/apps/intel/2016/impi/5.1.2.150/mic/bin/:$PATH
export LD_LIBRARY_PATH=/home/apps/intel/2016/impi/5.1.2.150/mic/lib/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/home/apps/intel/2016/lib/mic/:$LD_LIBRARY_PATH
export I_MPI_FABRICS=shm:tcp

export KMP_PLACE_THREADS=60c,4t
export KMP_AFFINITY=scatter

mpiexec.hydra -n 3 ./omp_tutor7_mpi -i omp_tutor7_mpi -p 521icru -o omp_tutor7_mpi -b

 

In this case, using MPI the program is distributed across the three MICs, and then multi-threading within each MIC is enabled using OpenMP. The problem arises when I try to use MICs that are located inside different nodes (cnf001 and cnf002), for example, using the following script:

#!/bin/bash
#SBATCH -J omp_tutor7_mpi-MIC
#SBATCH -p mics
#SBATCH -N 2
#SBATCH -w cnf001-mic0,cnf002-mic0
#SBATCH -o omp_tutor7_mpi-MIC-%j.out
#SBATCH -e omp_tutor7_mpi-MIC-%j.err

export PATH=/home/apps/intel/2016/impi/5.1.2.150/mic/bin/:$PATH
export LD_LIBRARY_PATH=/home/apps/intel/2016/impi/5.1.2.150/mic/lib/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/home/apps/intel/2016/lib/mic/:$LD_LIBRARY_PATH
export I_MPI_FABRICS=shm:tcp

export KMP_PLACE_THREADS=60c,4t
export KMP_AFFINITY=scatter

mpiexec.hydra -n 2 ./omp_tutor7_mpi -i omp_tutor7_mpi -p 521icru -o omp_tutor7_mpi -b

 

In this case I obtain no output from the MICs. The workload manager shows that the co-processors are running, however the execution does not end and I obtain not output neither from my code nor communication errors between the MICs, therefore I suppose that the co-processors are "hanged"and they are not executing my program. I have tried with different values for the I_MPI_DEBUG variable but again I do not obtain any output from the execution. The only "success" that I have obtained so far was using the following command to execute using MPI:

mpiexec.hydra -n 2 -hosts cnf001-mic0,cnf002-mic0 ./omp_tutor7_mpi -i omp_tutor7_mpi -p 521icru -o omp_tutor7_mpi -b

 

However, in that case the code really is executed in the first listed MIC (cnf001-mic0) and the second is simply ignored. About the communication, I am able to ssh between host and all MICs, and between MICs of both the same computing node and across nodes, therefore it does not seems to be an obvious communication problem. I would like to kindly ask any hint of where should I look to solve this problem. I am quite new to the computing world using MICs and I am very lost with this issue. Thanks for your help!

0 Kudos
16 Replies
Highlighted
Employee
23 Views

Hi Edgardo,

Could you please provide Hydra debug information for the problematic scenarios (it's enabled by 'export I_MPI_HYDRA_DEBUG=1')?

Regarding to this problem:

However, in that case the code really is executed in the first listed MIC (cnf001-mic0) and the second is simply ignored.

Could you please try to run this scenario with 'export I_MPI_PERHOST=1'?

Also try to simplify the scenario to something like this and provide its output:

mpirun -v -ppn 1 -n 2 -hosts node-mic0,node-mic1 hostname

mpirun -v -ppn 1 -n 2 -hosts node-mic0,node-mic1 IMB-MPI1 pingpong

 

0 Kudos
Highlighted
23 Views

Hi Artem,

for the first case, using I_MPI_HYDRA_DEBUG=1, i.e. using the following command in the script:

mpiexec.hydra -n 2 ./omp_tutor7_mpi -i omp_tutor7_mpi -p 521icru -o omp_tutor7_mpi -b

 

I attached the file I_MPI_HYDRA_DEBUG_1.txt for the output. It is quite large and it seems that it is able to recognise both MICs, but at some moment the execution gets stuck when the second MIC is called up.

For the second case (using the -hosts option), with I_MPI_PERHOST=1 I obtain again no output from the MICs and I have to manually cancel the job. If I add the I_MPI_HYDRA_DEBUG=1 environment variable I obtain the output stored in the attached file I_MPI_PERHOST_1.txt. It looks quite the same as the output from the first case.

Finally, when I tried to execute the commands with mpirun I obtained the following error message:

[edoerner@leftraru2 ~]$ export I_MPI_MIC=1
[edoerner@leftraru2 ~]$ mpirun -v -ppn 1 -n 2 -hosts cnf001-mic0,cnf002-mic0 hostname

...

mpiexec@leftraru2] STDIN will be redirected to 1 fd(s): 9 
[proxy:0:0@cnf001-mic0] Start PMI_proxy 0
[proxy:0:0@cnf001-mic0] STDIN will be redirected to 1 fd(s): 9 
[proxy:0:0@cnf001-mic0] HYDU_create_process (../../utils/launch/launch.c:588): execvp error on file hostname (No such file or directory)
[proxy:0:1@cnf002-mic0] Start PMI_proxy 1
[proxy:0:1@cnf002-mic0] HYDU_create_process (../../utils/launch/launch.c:588): execvp error on file hostname (No such file or directory)

Thanks for your help!

0 Kudos
Highlighted
Employee
23 Views

Hi Edgardo,

Thanks for the information. Could you please make sure that the following SSH paths work fine and non-interactive:

cnf001-mic0 -> cnf002-mic0
cnf002-mic0 -> cnf001-mic0
cnf002-mic0 -> cnf001-mic0.nlhpc.cl

Also check the IP forwarding settings, according to the Intel® MPI Library for Linux* OS User's Guide / chapter "Using the Intel® MPI Library with the Intel® Many Integrated Core (Intel® MIC) Architecture":

12.2. Multiple Cards
To use multiple cards for a single job, the Intel® Manycore Platform Software Stack (Intel® MPSS) needs to be configured for peer-to-peer support (see the Intel® MPSS documentation for details) and the host(s) needs to have IP forwarding enabled.
(host)$ sudo sysctl -w net.ipv4.ip_forward=1
Each host/card should be able to ping every other host/card and the launching host should be able to connect to every target, as with a classic cluster.

0 Kudos
Highlighted
23 Views

Dear Artem,

I have tested the SSH-paths with the following script and I did not found any issues.

#!/bin/bash

# Test script to ssh into MICs
ssh cnf002-mic0 "hostname && ssh cnf001-mic0.nlhpc.cl hostname"

 

I also tested for the IP forwarding settings and it seems that it is not enabled. I obtained the following

[edoerner@leftraru1 ~]$ sysctl -n net.ipv4.ip_forward
0

 

I will ask the admin about this setting (I do not have sudo privileges). Thanks for your help.

0 Kudos
Highlighted
23 Views

Well, we changed the IP forwarding settings and the problem persists. So we are still stuck on this...

0 Kudos
Highlighted
Employee
23 Views

Hi Edgardo,

Could you please try to run the following test scenario and provide its output:

export I_MPI_DEBUG=100

export I_MPI_FABRICS=tcp

mpiexec.hydra -v -ppn 1 -n 2 -hosts node-mic0,node-mic1 IMB-MPI1 pingpong -msglog 0:1

0 Kudos
Highlighted
23 Views

I attached the output for your suggested test scenario. It get stuck and I have to finally cancel the running job.

Thanks for your help!

0 Kudos
Highlighted
Employee
23 Views

Hi,

I just curious to see if the problem comes from the second node. If you change the roles of cnf001 and chf002, do you still see the problem? Instead of doing:

# mpiexec.hydra -n 2 -hosts cnf001-mic0,cnf002-mic0 ./omp_tutor7_mpi -i omp_tutor7_mpi -p 521icru -o omp_tutor7_mpi -b

Can you test

# mpiexec.hydra -n 2 -hosts cnf002-mic0,cnf001-mic0 ./omp_tutor7_mpi -i omp_tutor7_mpi -p 521icru -o omp_tutor7_mpi -b

 

0 Kudos
Highlighted
Employee
23 Views

Hi Edgardo,

Thanks for the information. I didn't see anything suspicious in the provided log file. As far as I see there're some problems with TCP connection from cnf002-mic0 to cnf001-mic0.nlhpc.cl (port: 51440). You said that "cnf002-mic0 -> cnf001-mic0.nlhpc.cl" works fine over SSH, so potentially there're some firewall limitations. Could you please check the firewall status on cnf001-mic0/cnf002-mic0 (administrator's permissions may be required for this)?

0 Kudos
Highlighted
23 Views

Dear Loc,

I tested changing the order of the MICs and I also have issues, I attached the log file. Reading it it seems quite the same as the original case, but now the roles are "interchanged" between the co-processors.

@Artem: I will ask the administrator to look at the Firewall settings, thanks for your time!

0 Kudos
Highlighted
23 Views

As stated by the Administrator, the MICs have not iptables installed and in the nodes containing the MICs all the settings are in accept:

[root@cnf001 ~]# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

 

0 Kudos
Highlighted
Employee
23 Views

Hi Edgardo,

Could you please double check that:
1. IP forwarding is enabled on both cnf001/cnf002 hosts
2. Firewall is disabled on both cnf001/cnf002 hosts

By cnf001/cnf002 I mean HOST side of the nodes where cnf001-mic0/cnf002-mic0 MIC cards are placed.

0 Kudos
Highlighted
23 Views

the admin says that all the nodes have forwarding enabled and the firewall inactive. For example,

[root@master1 ~]# pdsh -w cnf00[1-4] sysctl net.ipv4.ip_forward
cnf002: net.ipv4.ip_forward = 1
cnf004: net.ipv4.ip_forward = 1
cnf003: net.ipv4.ip_forward = 1
cnf001: net.ipv4.ip_forward = 1

[root@master1 ~]# pdsh -w cnf00[1-4] systemctl status firewalld
cnf002: ● firewalld.service - firewalld - dynamic firewall daemon
cnf002:    Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
cnf002:    Active: inactive (dead)
pdsh@master1: cnf002: ssh exited with exit code 3
cnf004: ● firewalld.service - firewalld - dynamic firewall daemon
cnf004:    Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
cnf004:    Active: inactive (dead)
pdsh@master1: cnf004: ssh exited with exit code 3
cnf003: ● firewalld.service - firewalld - dynamic firewall daemon
cnf003:    Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
cnf003:    Active: inactive (dead)
pdsh@master1: cnf003: ssh exited with exit code 3
cnf001: ● firewalld.service - firewalld - dynamic firewall daemon
cnf001:    Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
cnf001:    Active: inactive (dead)

Thanks for your time!

0 Kudos
Highlighted
Employee
23 Views

Hi Edgardo,

Could you please try the following scenarios and provide the corresponding output?

Run on node1-mic0:
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
mpiexec.hydra -v -ppn 1 -n 2 -hosts node2-mic0 IMB-MPI1 pingpong -msglog 0:1

Run on node1-mic0:
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
export I_MPI_HYDRA_BOOTSTRAP=slurm
mpiexec.hydra -v -ppn 1 -n 2 -hosts node1-mic0,node2-mic0 IMB-MPI1 pingpong -msglog 0:1

Run on node1-mic0:
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
mpiexec.hydra -v -localhost <node1_mic0_ip_address> -ppn 1 -n 2 -hosts node1-mic0,node2-mic0 IMB-MPI1 pingpong -msglog 0:1

Run on node1-mic0:
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
mpiexec.hydra -v -localhost <node1_mic0_ip_address> -ppn 1 -n 2 -hosts <node1_mic0_ip_address>,<node2_mic0_ip_address> IMB-MPI1 pingpong -msglog 0:1

Run on node1:
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
mpiexec.hydra -v -ppn 1 -n 2 -hosts node1-mic0,node2-mic0 IMB-MPI1 pingpong -msglog 0:1


Could you please also specify OS and MPSS version for the nodes?

 

0 Kudos
Highlighted
Employee
23 Views

Some additions for the last scenario:

Run on node1:
. <impi_install_path>/intel64/bin/mpivars.sh
export I_MPI_DEBUG=100
export I_MPI_FABRICS=tcp
export I_MPI_MIC=1
export I_MPI_MIC_PREFIX=$I_MPI_ROOT/mic/bin/
mpiexec.hydra -v -ppn 1 -n 2 -hosts node1-mic0,node2-mic0 IMB-MPI1 pingpong -msglog 0:1

For all the scenarios check that SSH is password-less before the run.

0 Kudos
Highlighted
23 Views

Hi Artem,

I am sorry for the delay, I have ran the scenarios for the MICs, here comes the list:

I got a little bit confused about the last two scenarios. Must I run them from the CPU host?. Thanks for your time!.

  1. s1-MIC.txt :
    Run on node1-mic0:

    export I_MPI_DEBUG=100
    export I_MPI_FABRICS=tcp
    mpiexec.hydra -v -ppn 1 -n 2 -hosts node2-mic0 IMB-MPI1 pingpong -msglog 0:1
  2. s2-MIC.txt: 
    Run on node1-mic0:

    export I_MPI_DEBUG=100
    export I_MPI_FABRICS=tcp
    export I_MPI_HYDRA_BOOTSTRAP=slurm
    mpiexec.hydra -v -ppn 1 -n 2 -hosts node1-mic0,node2-mic0 IMB-MPI1 pingpong -msglog 0:1
  3. s3-MIC.txt: 

    Run on node1-mic0:
    export I_MPI_DEBUG=100
    export I_MPI_FABRICS=tcp
    mpiexec.hydra -v -localhost <node1_mic0_ip_address> -ppn 1 -n 2 -hosts node1-mic0,node2-mic0 IMB-MPI1 pingpong -msglog 0:1

  4. s4-MIC.txt: Run on node1-mic0:
    export I_MPI_DEBUG=100
    export I_MPI_FABRICS=tcp
    mpiexec.hydra -v -localhost <node1_mic0_ip_address> -ppn 1 -n 2 -hosts <node1_mic0_ip_address>,<node2_mic0_ip_address> IMB-MPI1 pingpong -msglog 0:1

  5.  
  6.  
0 Kudos