- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In using Intel MPI to run my application, how can I make it use a different network interface, I have two network interfaces on all nodes on the cluster and I want to force my application to run on the ib0 interface.
How can I achieve this behaviour?
Currently I am doing the below but I do not seem to see any load on the network when I check ifstat.
#!/bin/bash #SBATCH --job-name=resnet50_cifar100_job #SBATCH --output=resnet50_cifar100_output_opx_%j.txt #SBATCH --error=resnet50_cifar100_error_opx_%j.txt #SBATCH --ntasks=16 #SBATCH --nodes=4 #SBATCH --ntasks-per-node=4 # Source the environment setup script source $HOME/activate_environment.sh # Activate the Python virtual environment source $HOME/torch_mpi_env/bin/activate #export FI_TCP_IFACE=ib0 #export FI_PROVIDER=psm2 #export I_MPI_FABRICS=ofi #export I_MPI_FALLBACK=0 export I_MPI_DEBUG=5 export I_MPI_FABRICS=shm:ofi export I_MPI_OFI_PROVIDER=psm2 export MPIP="-f ./mpip_results" export SLURM_NETWORK=ib0 # Run the Python script srun --mpi=pmi2 --network=ib0 \ --export=ALL,LD_PRELOAD=$HOME/mpiP_build/lib/libmpiP.so \ python $HOME/torch_projects/resnet50_cifar100.py --epochs 200 # Deactivate the virtual environment deactivate
Am I doing the right thing?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ato_markin
Intel MPI will always select the fastest NIC by default.
You may add I_MPI_DEBUG=10 which will give you pinning information on which rank uses which NIC. This is available in the latest release only.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Below is the output when I run with I_MPI_DEBUG=10
I do not see anything showing the network interface it is running on. Can you assist on what steps to take please ? @TobiasK
[0] MPI startup(): Intel(R) MPI Library, Version 2021.10 Build 20230619 (id: c2e19c2f3e)
[0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric loaded: libfabric.so.1
[0] MPI startup(): libfabric version: 1.18.0-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: psm2
[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.10.0/etc/tuning_skx_shm-ofi_psm2.dat" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.10.0/etc/tuning_skx_shm-ofi.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): tag bits available: 30 (TAG_UB value: 1073741823)
[0] MPI startup(): source bits available: 30 (Maximal number of rank: 1073741823)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 1856744 node21 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55}
[0] MPI startup(): 1 1815694 node22 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55}
[0] MPI startup(): 2 1806705 node23 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55}
[0] MPI startup(): 3 1866405 node24 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55}
[0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.10.0
[0] MPI startup(): ONEAPI_ROOT=/opt/intel/oneapi
[0] MPI startup(): I_MPI_BIND_WIN_ALLOCATE=localalloc
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_RETURN_WIN_MEM_NUMA=0
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=10
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ato_markin
as noted this feature is relatively new, so please just upgrade to the latest 2021.12
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page