I have a workstation with Ubuntu 22.04 running the 6.5.0-14-generic kernel and an Intel Core i9-13900K processor. This processor has 8 P-cores and 16 E-cores.
I have a CFD application, and my experience is that running on the 8 P-cores are about as fast as running on all 24 cores together.
With Intel MPI 2021.7.1 I set the environment variable I_MPI_PIN_PROCESSOR_LIST="0,2,4,6,8,10,12,14,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31". This means that when I run on only 8 cores, the 8 physical cores corresponding to the P-cores were used, and when I decided to run on 24 cores, all the physical cores was used.
With Intel MPI 2021.7.1 setting I_MPI_DEBUG=5 I got a confirmation on the terminal that the pinning was indeed working:
[0] MPI startup(): Intel(R) MPI Library, Version 2021.7 Build 20221022 (id: f7b29a2495)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.13.2rc1-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[0] MPI startup(): File "/home/hakostra/KMT/mglet-8.6.2+40-g87906f9/IntelMPI/etc/tuning_skx_shm-ofi_tcp-ofi-rxm_10.dat" not found
[0] MPI startup(): Load tuning file: "/home/hakostra/KMT/mglet-8.6.2+40-g87906f9/IntelMPI/etc/tuning_skx_shm-ofi_tcp-ofi-rxm.dat"
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 731216 kmt-trd2 0
[0] MPI startup(): 1 731217 kmt-trd2 2
[0] MPI startup(): 2 731218 kmt-trd2 4
[0] MPI startup(): 3 731219 kmt-trd2 6
[0] MPI startup(): 4 731220 kmt-trd2 8
[0] MPI startup(): 5 731221 kmt-trd2 10
[0] MPI startup(): 6 731222 kmt-trd2 12
[0] MPI startup(): 7 731223 kmt-trd2 14
[0] MPI startup(): I_MPI_ROOT=/home/hakostra/KMT/mglet-8.6.2+40-g87906f9/IntelMPI
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_PIN_PROCESSOR_LIST=0,2,4,6,8,10,12,14,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=5
However, now trying Intel MPI 2021.11 (as part of the oneAPI 2024.0 toolkits), when I set the same, I get:
[0] MPI startup(): Intel(R) MPI Library, Version 2021.11 Build 20231005 (id: 74c4a23)
[0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): File "" not found
[0] MPI startup(): Load tuning file: "/home/hakostra/KMT/el-8/IntelMPI/opt/mpi/etc/tuning_skx_shm.dat"
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 738139 kmt-trd2 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
[0] MPI startup(): 1 738140 kmt-trd2 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
[0] MPI startup(): 2 738141 kmt-trd2 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
[0] MPI startup(): 3 738142 kmt-trd2 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
[0] MPI startup(): 4 738143 kmt-trd2 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
[0] MPI startup(): 5 738144 kmt-trd2 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
[0] MPI startup(): 6 738145 kmt-trd2 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
[0] MPI startup(): 7 738146 kmt-trd2 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
[0] MPI startup(): I_MPI_ROOT=/home/hakostra/KMT/el-8/IntelMPI
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_BIND_WIN_ALLOCATE=localalloc
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_RETURN_WIN_MEM_NUMA=-1
[0] MPI startup(): I_MPI_PIN=1
[0] MPI startup(): I_MPI_PIN_PROCESSOR_LIST=0,2,4,6,8,10,12,14,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
[0] MPI startup(): I_MPI_PIN_RESPECT_CPUSET=0
[0] MPI startup(): I_MPI_PIN_RESPECT_HCA=0
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_FABRICS=shm
[0] MPI startup(): I_MPI_DEBUG=5
I.e. it seems that the pinning is not applied properly.
I have tried a number of different variables (I_MPI_PIN_RESPECT_CPUSET=0,I_MPI_PIN_RESPECT_HCA=0), and none of them seems to affect the pinning at all.
Does anyone have an advice or insight in the issue?
Thanks in advance.
链接已复制
Does not appear to be working:
[0] MPI startup(): Intel(R) MPI Library, Version 2021.11 Build 20231005 (id: 74c4a23)
[0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.18.1-impi
[0] MPI startup(): libfabric provider: tcp
[0] MPI startup(): File "" not found
[0] MPI startup(): Load tuning file: "/home/hakostra/KMT/el-8/IntelMPI/opt/mpi/etc/tuning_skx_shm-ofi.dat"
[0] MPI startup(): ===== Nic pinning on kmt-trd2 =====
[0] MPI startup(): Rank Pin nic
[0] MPI startup(): 0 enp9s0f0
[0] MPI startup(): 1 enp9s0f0
[0] MPI startup(): 2 enp9s0f0
[0] MPI startup(): 3 enp9s0f0
[0] MPI startup(): 4 enp9s0f0
[0] MPI startup(): 5 enp9s0f0
[0] MPI startup(): 6 enp9s0f0
[0] MPI startup(): 7 enp9s0f0
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 1164811 kmt-trd2 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31}
[0] MPI startup(): 1 1164812 kmt-trd2 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31}
[0] MPI startup(): 2 1164813 kmt-trd2 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31}
[0] MPI startup(): 3 1164814 kmt-trd2 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31}
[0] MPI startup(): 4 1164815 kmt-trd2 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31}
[0] MPI startup(): 5 1164816 kmt-trd2 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31}
[0] MPI startup(): 6 1164817 kmt-trd2 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31}
[0] MPI startup(): 7 1164818 kmt-trd2 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31}
[0] MPI startup(): I_MPI_ROOT=/home/hakostra/KMT/el-8/IntelMPI
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_BIND_WIN_ALLOCATE=localalloc
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_RETURN_WIN_MEM_NUMA=-1
[0] MPI startup(): I_MPI_PIN=1
[0] MPI startup(): I_MPI_PIN_CELL=core
[0] MPI startup(): I_MPI_PIN_DOMAIN=1
[0] MPI startup(): I_MPI_PIN_ORDER=compact
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=5
The only apparent difference is that now there are additional curly brackets { } around the pinned core list..
And here is the output of the cpuinfo tool:
cpuinfo
Intel(R) processor family information utility, Version 2021.11 Build 20231005 (id: 74c4a23)
Copyright (C) 2005-2023 Intel Corporation. All rights reserved.
===== Processor composition =====
Processor name : 13th Gen Intel(R) Core(TM) i9-13900K
Packages(sockets) : 1
Cores : 24
Processors(CPUs) : 32
Cores per package : 24
Threads per core : 1
===== Processor identification =====
Processor Thread Id. Core Id. Package Id.
0 0 0 0
1 1 0 0
2 0 4 0
3 1 4 0
4 0 8 0
5 1 8 0
6 0 12 0
7 1 12 0
8 0 16 0
9 1 16 0
10 0 20 0
11 1 20 0
12 0 24 0
13 1 24 0
14 0 28 0
15 1 28 0
16 0 32 0
17 0 33 0
18 0 34 0
19 0 35 0
20 0 36 0
21 0 37 0
22 0 38 0
23 0 39 0
24 0 40 0
25 0 41 0
26 0 42 0
27 0 43 0
28 0 44 0
29 0 45 0
30 0 46 0
31 0 47 0
===== Placement on packages =====
Package Id. Core Id. Processors
0 0,4,8,12,16,20,24,28,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47 (0,1)(2,3)(4,5)(6,7)(8,9)(10,11)(12,13)(14,15)16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
===== Cache sharing =====
Cache Size Processors
L1 48 KB (0,1)(2,3)(4,5)(6,7)(8,9)(10,11)(12,13)(14,15)
L2 2 MB (0,1)(2,3)(4,5)(6,7)(8,9)(10,11)(12,13)(14,15)(16,17,18,19)(20,21,22,23)(24,25,26,27)(28,29,30,31)
L3 36 MB (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31)
Hello hakostra1,
I could reproduce the behaviour in one of our machines. I'll let you know once I have a position from our engineers.
Cheers,
Rafael
That's good to hear, reproducing these issues are often difficult. Pinning is a key feature in any MPI implementation and I hope that you figure out what's going on here. Thanks for the feedback so far!
Hi haokstra1,
We disabled the pinning of MPI ranks for the scenario with mixed P- and E-cores in IMPI-2021.11. We recommend to revert to 2021.9 for the time being. We are discussing how to properly address the issue in future releases of Intel MPI.
Cheers,
Rafael
