Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2187 Discussions

Intel MPI for Omni Path Express Network Interface

ato_markin
Beginner
364 Views

In using Intel MPI to run my application, how can I make it use a different network interface, I have two network interfaces on all nodes on the cluster and I want to force my application to run on the ib0 interface.

How can I achieve this behaviour?

 

Currently I am doing the below but I do not seem to see any load on the network when I check ifstat.

 

 

#!/bin/bash
#SBATCH --job-name=resnet50_cifar100_job
#SBATCH --output=resnet50_cifar100_output_opx_%j.txt
#SBATCH --error=resnet50_cifar100_error_opx_%j.txt
#SBATCH --ntasks=16                
#SBATCH --nodes=4             
#SBATCH --ntasks-per-node=4        

# Source the environment setup script
source $HOME/activate_environment.sh

# Activate the Python virtual environment
source $HOME/torch_mpi_env/bin/activate


#export FI_TCP_IFACE=ib0
#export FI_PROVIDER=psm2
#export I_MPI_FABRICS=ofi
#export I_MPI_FALLBACK=0
 
export I_MPI_DEBUG=5
export I_MPI_FABRICS=shm:ofi
export I_MPI_OFI_PROVIDER=psm2

export MPIP="-f ./mpip_results"

export SLURM_NETWORK=ib0

# Run the Python script
srun --mpi=pmi2 --network=ib0 \
     --export=ALL,LD_PRELOAD=$HOME/mpiP_build/lib/libmpiP.so \
     python $HOME/torch_projects/resnet50_cifar100.py --epochs 200


# Deactivate the virtual environment
deactivate


 

Am I doing the right thing? 

0 Kudos
5 Replies
TobiasK
Moderator
298 Views

@ato_markin 
Intel MPI will always select the fastest NIC by default.
You may add I_MPI_DEBUG=10 which will give you pinning information on which rank uses which NIC. This is available in the latest release only.

ato_markin
Beginner
200 Views

@TobiasK 

 

Below is the output when I run with I_MPI_DEBUG=10

I do not see anything showing the network interface it is running on. Can you assist on what steps to take please ? @TobiasK 

[0] MPI startup(): Intel(R) MPI Library, Version 2021.10 Build 20230619 (id: c2e19c2f3e)
[0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric loaded: libfabric.so.1
[0] MPI startup(): libfabric version: 1.18.0-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: psm2
[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.10.0/etc/tuning_skx_shm-ofi_psm2.dat" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.10.0/etc/tuning_skx_shm-ofi.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): tag bits available: 30 (TAG_UB value: 1073741823)
[0] MPI startup(): source bits available: 30 (Maximal number of rank: 1073741823)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 1856744 node21 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55}
[0] MPI startup(): 1 1815694 node22 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55}
[0] MPI startup(): 2 1806705 node23 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55}
[0] MPI startup(): 3 1866405 node24 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55}
[0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.10.0
[0] MPI startup(): ONEAPI_ROOT=/opt/intel/oneapi
[0] MPI startup(): I_MPI_BIND_WIN_ALLOCATE=localalloc
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_RETURN_WIN_MEM_NUMA=0
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=10

 

0 Kudos
TobiasK
Moderator
171 Views

@ato_markin 
as noted this feature is relatively new, so please just upgrade to the latest 2021.12

0 Kudos
ato_markin
Beginner
166 Views
0 Kudos
ato_markin
Beginner
271 Views

Thank you very much for giving me this clarity @TobiasK 

0 Kudos
Reply