- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am researching how to pin processes across sockets using the I_MPI_PIN_DOMAIN and I_MPI_PIN_ORDER environment variables. I have tried many combinations, but none of them have worked. I used Intel MPI 2021.12.1 and ran TACC amask on two systems: one equipped with two Xeon Platinum 8470 CPUs and the other equipped with two Xeon CPU Max 9470 CPUs.
With the default settings, I ran the following command:
mpiexec -n 4 amask_mpi
and I got the process placement in bunch order as expected:
Each row of matrix is a mask for a Hardware Thread (hwt).
CORE ID = matrix digit + column group # in |...|
A set mask bit (proc-id) = core id + add 104 to each additional row.
rank | 0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
0000 0---4---8---2---6---0---4---8---2---6---0---4---8---2---6---0---4---8---2---6---0---4---8---2-----------
0001 --2---6---0---4---8---2---6---0---4---8---2---6---0---4---8---2---6---0---4---8---2---6---0---4---------
0002 -1---5---9---3---7---1---5---9---3---7---1---5---9---3---7---1---5---9---3---7---1---5---9---3----------
0003 ---3---7---1---5---9---3---7---1---5---9---3---7---1---5---9---3---7---1---5---9---3---7---1---5--------
The first two processes are bound to the first and second NUMA nodes in the first socket, and the remaining two processes are bound to the first and second NUMA nodes in the second socket.
I can also achieve the process placement in compact order with the following variables:
I_MPI_PIN_DOMAIN = core
I_MPI_PIN_ORDER = compact
and I got the processes placed in the first NUMA node:
Each row of matrix is a mask for a Hardware Thread (hwt).
CORE ID = matrix digit + column group # in |...|
A set mask bit (proc-id) = core id + add 104 to each additional row.
rank | 0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
0000 0-------------------------------------------------------------------------------------------------------
0001 --------8-----------------------------------------------------------------------------------------------
0002 ----------------6---------------------------------------------------------------------------------------
0003 ------------------------4-------------------------------------------------------------------------------
I have been experimenting with different values of I_MPI_PIN_DOMAIN and I_MPI_PIN_ORDER to achieve process placement in a scatter order across sockets. For example, I want the first and third processes to be bound to the first socket, and the second and fourth processes to be bound to the second socket. However, I have not had any success in finding a working combination. Could you please provide any suggestions? Thank you.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sure. The following is the output with I_MPI_DEBUG=10, I_MPI_PIN_DOMAIN=core and I_MPI_PIN_ORDER=scatter:
[0] MPI startup(): Intel(R) MPI Library, Version 2021.10 Build 20230619 (id: c2e19c2f3e)
[0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): shm segment size (342 MB per rank) * (4 local ranks) = 1368 MB total
[0] MPI startup(): libfabric loaded: libfabric.so.1
[0] MPI startup(): libfabric version: 1.18.0-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): File "/apps/spack/0.21/cardinal/linux-rhel9-sapphirerapids/intel-oneapi-mpi/intel/2021.10.0/2021.10.0-a2ei2t4/mpi/2021.10.0/etc/tuning_spr_shm-ofi_mlx_100.dat" not found
[0] MPI startup(): Load tuning file: "/apps/spack/0.21/cardinal/linux-rhel9-sapphirerapids/intel-oneapi-mpi/intel/2021.10.0/2021.10.0-a2ei2t4/mpi/2021.10.0/etc/tuning_spr_shm-ofi.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): tag bits available: 20 (TAG_UB value: 1048575)
[0] MPI startup(): source bits available: 21 (Maximal number of rank: 2097151)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 776163 c1002.ten.osc.edu {0}
[0] MPI startup(): 1 776164 c1002.ten.osc.edu {8}
[0] MPI startup(): 2 776165 c1002.ten.osc.edu {16}
[0] MPI startup(): 3 776166 c1002.ten.osc.edu {24}
[0] MPI startup(): I_MPI_CC=icc
[0] MPI startup(): I_MPI_CXX=icpc
[0] MPI startup(): I_MPI_FC=ifort
[0] MPI startup(): I_MPI_F90=ifort
[0] MPI startup(): I_MPI_F77=ifort
[0] MPI startup(): I_MPI_ROOT=/apps/spack/0.21/cardinal/linux-rhel9-sapphirerapids/intel-oneapi-mpi/intel/2021.10.0/2021.10.0-a2ei2t4/mpi/2021.10.0
[0] MPI startup(): I_MPI_BIND_WIN_ALLOCATE=localalloc
[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS=--external-launcher
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0
[0] MPI startup(): I_MPI_HYDRA_BRANCH_COUNT=-1
[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=slurm
[0] MPI startup(): I_MPI_RETURN_WIN_MEM_NUMA=0
[0] MPI startup(): I_MPI_PIN_DOMAIN=core
[0] MPI startup(): I_MPI_PIN_ORDER=scatter
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=10
Each row of matrix is a mask for a Hardware Thread (hwt).
CORE ID = matrix digit + column group # in |...|
A set mask bit (proc-id) = core id + add 104 to each additional row.
rank | 0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
0000 0-------------------------------------------------------------------------------------------------------
0001 --------8-----------------------------------------------------------------------------------------------
0002 ----------------6---------------------------------------------------------------------------------------
0003 ------------------------4-------------------------------------------------------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems you are using Slurm. If you are using Slurm, please use srun und manage pinning through Slurm.
To use mpiexec/mpirun and ignore the Slurm settings please follow the settings here:
https://www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-linux/2021-13/job-schedulers-support.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I am using Slurm, but I am sure that I have used the IntelMPI Hydra process manager. When I use Slurm, I have set the following:
I_MPI_HYDRA_BOOTSTRAP=slurm
I_MPI_PMI_LIBRARY=/usr/lib64/libpmi2.so
I had these settings to use the Hydra process manager when I obtained the previous output I sent:
export -n I_MPI_HYDRA_BOOTSTRAP I_MPI_PMI_LIBRARY
I am confident because Slurm CPU binding control is not working.
I also tried using SSH as the Hydra bootstrap, but I got the same result.
$ export -n I_MPI_HYDRA_BOOTSTRAP I_MPI_PMI_LIBRARY
$ export I_MPI_HYDRA_BOOTSTRAP=ssh
$ mpiexec -n 4 bin/amask_mpi
[0] MPI startup(): Intel(R) MPI Library, Version 2021.10 Build 20230619 (id: c2e19c2f3e)
[0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): shm segment size (342 MB per rank) * (4 local ranks) = 1368 MB total
[0] MPI startup(): libfabric loaded: libfabric.so.1
[0] MPI startup(): libfabric version: 1.18.0-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): File "/apps/spack/0.21/cardinal/linux-rhel9-sapphirerapids/intel-oneapi-mpi/intel/2021.10.0/2021.10.0-a2ei2t4/mpi/2021.10.0/etc/tuning_spr_shm-ofi_mlx_100.dat" not found
[0] MPI startup(): Load tuning file: "/apps/spack/0.21/cardinal/linux-rhel9-sapphirerapids/intel-oneapi-mpi/intel/2021.10.0/2021.10.0-a2ei2t4/mpi/2021.10.0/etc/tuning_spr_shm-ofi.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): tag bits available: 20 (TAG_UB value: 1048575)
[0] MPI startup(): source bits available: 21 (Maximal number of rank: 2097151)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 783094 c1002.ten.osc.edu {0}
[0] MPI startup(): 1 783095 c1002.ten.osc.edu {8}
[0] MPI startup(): 2 783096 c1002.ten.osc.edu {16}
[0] MPI startup(): 3 783097 c1002.ten.osc.edu {24}
[0] MPI startup(): I_MPI_CC=icc
[0] MPI startup(): I_MPI_CXX=icpc
[0] MPI startup(): I_MPI_FC=ifort
[0] MPI startup(): I_MPI_F90=ifort
[0] MPI startup(): I_MPI_F77=ifort
[0] MPI startup(): I_MPI_ROOT=/apps/spack/0.21/cardinal/linux-rhel9-sapphirerapids/intel-oneapi-mpi/intel/2021.10.0/2021.10.0-a2ei2t4/mpi/2021.10.0
[0] MPI startup(): I_MPI_BIND_WIN_ALLOCATE=localalloc
[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS=--external-launcher
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0
[0] MPI startup(): I_MPI_HYDRA_BRANCH_COUNT=-1
[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=ssh
[0] MPI startup(): I_MPI_RETURN_WIN_MEM_NUMA=0
[0] MPI startup(): I_MPI_PIN_DOMAIN=core
[0] MPI startup(): I_MPI_PIN_ORDER=scatter
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=10
Each row of matrix is a mask for a Hardware Thread (hwt).
CORE ID = matrix digit + column group # in |...|
A set mask bit (proc-id) = core id + add 104 to each additional row.
rank | 0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
0000 0-------------------------------------------------------------------------------------------------------
0001 --------8-----------------------------------------------------------------------------------------------
0002 ----------------6---------------------------------------------------------------------------------------
0003 ------------------------4-------------------------------------------------------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
can you please try the pinning simulator:
https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library-pinning-simulator.html
Also please set
I_MPI_PIN_RESPECT_HCA=0
I_MPI_PIN_RESPECT_CPUSET=0
Best
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have tried the simulator, and it suggests the following command line to achieve scatter pinning across sockets:
I_MPI_PIN_DOMAIN=core I_MPI_PIN_ORDER=scatter I_MPI_PIN_CELL=unit mpiexec -n 4
I used the same environment variables along with the other two you suggested previously. However, I still got the compact-order result as I reported.
![](/skins/images/3344F5B3B76C91485ED0E980FD0CA95E/responsive_peak/images/icon_anonymous_message.png)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page