Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
2275 Discussions

Intel MPI for Omni Path Express Network Interface

ato_markin
Beginner
1,217 Views

In using Intel MPI to run my application, how can I make it use a different network interface, I have two network interfaces on all nodes on the cluster and I want to force my application to run on the ib0 interface.

How can I achieve this behaviour?

 

Currently I am doing the below but I do not seem to see any load on the network when I check ifstat.

 

 

#!/bin/bash
#SBATCH --job-name=resnet50_cifar100_job
#SBATCH --output=resnet50_cifar100_output_opx_%j.txt
#SBATCH --error=resnet50_cifar100_error_opx_%j.txt
#SBATCH --ntasks=16                
#SBATCH --nodes=4             
#SBATCH --ntasks-per-node=4        

# Source the environment setup script
source $HOME/activate_environment.sh

# Activate the Python virtual environment
source $HOME/torch_mpi_env/bin/activate


#export FI_TCP_IFACE=ib0
#export FI_PROVIDER=psm2
#export I_MPI_FABRICS=ofi
#export I_MPI_FALLBACK=0
 
export I_MPI_DEBUG=5
export I_MPI_FABRICS=shm:ofi
export I_MPI_OFI_PROVIDER=psm2

export MPIP="-f ./mpip_results"

export SLURM_NETWORK=ib0

# Run the Python script
srun --mpi=pmi2 --network=ib0 \
     --export=ALL,LD_PRELOAD=$HOME/mpiP_build/lib/libmpiP.so \
     python $HOME/torch_projects/resnet50_cifar100.py --epochs 200


# Deactivate the virtual environment
deactivate


 

Am I doing the right thing? 

0 Kudos
5 Replies
TobiasK
Moderator
1,151 Views

@ato_markin 
Intel MPI will always select the fastest NIC by default.
You may add I_MPI_DEBUG=10 which will give you pinning information on which rank uses which NIC. This is available in the latest release only.

ato_markin
Beginner
1,053 Views

@TobiasK 

 

Below is the output when I run with I_MPI_DEBUG=10

I do not see anything showing the network interface it is running on. Can you assist on what steps to take please ? @TobiasK 

[0] MPI startup(): Intel(R) MPI Library, Version 2021.10 Build 20230619 (id: c2e19c2f3e)
[0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric loaded: libfabric.so.1
[0] MPI startup(): libfabric version: 1.18.0-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: psm2
[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.10.0/etc/tuning_skx_shm-ofi_psm2.dat" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.10.0/etc/tuning_skx_shm-ofi.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): tag bits available: 30 (TAG_UB value: 1073741823)
[0] MPI startup(): source bits available: 30 (Maximal number of rank: 1073741823)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 1856744 node21 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55}
[0] MPI startup(): 1 1815694 node22 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55}
[0] MPI startup(): 2 1806705 node23 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55}
[0] MPI startup(): 3 1866405 node24 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55}
[0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.10.0
[0] MPI startup(): ONEAPI_ROOT=/opt/intel/oneapi
[0] MPI startup(): I_MPI_BIND_WIN_ALLOCATE=localalloc
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_RETURN_WIN_MEM_NUMA=0
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=10

 

0 Kudos
TobiasK
Moderator
1,024 Views

@ato_markin 
as noted this feature is relatively new, so please just upgrade to the latest 2021.12

0 Kudos
ato_markin
Beginner
1,019 Views
0 Kudos
ato_markin
Beginner
1,124 Views

Thank you very much for giving me this clarity @TobiasK 

0 Kudos
Reply