Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2275 Discussions

64 core simulation on quad socket Intel Xeon on Windows 2022 server does not start properly

Frank_R_1
New Contributor I
13,172 Views

Dear support,

We have a customer with the following computer and configuration:

4 x 18C Xeon Gold 6254 (4 x 18cores quad socket)
24 x 8GB RAM
Windows 2022 Server
4 NUMA domains occur in the task manager of Windows
Hyperthreding is disabled

We start our product with (Intel(R) MPI Library, Version 2021.7 Build 20220909)
mpiexec.exe -delegate -genvall -print-all-exitcodes -genv I_MPI_HYDRA_DEBUG 1 -genv I_MPI_DEBUG 500 -genv I_MPI_HYDRA_BSTRAP_KEEP_ALIVE 1 -genv I_MPI_CBWR 2 -genv I_MPI_ADJUST_GATHERV 3 -envall -localroot -n 64 #programpath

The problem is that 64 processes start on 64 cores distributed over 4 NUMA domains and immediately redistribute to only 2 NUMA domains and oversubscription.

Some output from MPI
[proxy:0:0@detorsrv007] Warning - oversubscription detected: 64 processes will be placed on 54 cores

[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 15736 detorsrv007 0
[0] MPI startup(): 1 14860 detorsrv007 1
[0] MPI startup(): 2 11560 detorsrv007 2
[0] MPI startup(): 3 9696 detorsrv007 3
[0] MPI startup(): 4 5788 detorsrv007 4
[0] MPI startup(): 5 12560 detorsrv007 5
[0] MPI startup(): 6 13628 detorsrv007 6
[0] MPI startup(): 7 16100 detorsrv007 7
[0] MPI startup(): 8 15540 detorsrv007 8
[0] MPI startup(): 9 15464 detorsrv007 9
[0] MPI startup(): 10 13716 detorsrv007 10
[0] MPI startup(): 11 12504 detorsrv007 11
[0] MPI startup(): 12 8796 detorsrv007 12
[0] MPI startup(): 13 8924 detorsrv007 13
[0] MPI startup(): 14 1168 detorsrv007 14
[0] MPI startup(): 15 13316 detorsrv007 15
[0] MPI startup(): 16 16212 detorsrv007 16
[0] MPI startup(): 17 14516 detorsrv007 17
[0] MPI startup(): 18 13844 detorsrv007 18
[0] MPI startup(): 19 12268 detorsrv007 19
[0] MPI startup(): 20 9208 detorsrv007 20
[0] MPI startup(): 21 14912 detorsrv007 21
[0] MPI startup(): 22 12760 detorsrv007 22
[0] MPI startup(): 23 12312 detorsrv007 23
[0] MPI startup(): 24 3856 detorsrv007 24
[0] MPI startup(): 25 2924 detorsrv007 25
[0] MPI startup(): 26 15036 detorsrv007 26
[0] MPI startup(): 27 13348 detorsrv007 27
[0] MPI startup(): 28 12316 detorsrv007 28
[0] MPI startup(): 29 15028 detorsrv007 29
[0] MPI startup(): 30 9316 detorsrv007 30
[0] MPI startup(): 31 2000 detorsrv007 31
[0] MPI startup(): 32 11196 detorsrv007 32
[0] MPI startup(): 33 11596 detorsrv007 33
[0] MPI startup(): 34 9640 detorsrv007 34
[0] MPI startup(): 35 14072 detorsrv007 35
[0] MPI startup(): 36 14504 detorsrv007 0
[0] MPI startup(): 37 13492 detorsrv007 1
[0] MPI startup(): 38 13084 detorsrv007 2
[0] MPI startup(): 39 9140 detorsrv007 3
[0] MPI startup(): 40 9084 detorsrv007 4
[0] MPI startup(): 41 12776 detorsrv007 5
[0] MPI startup(): 42 3908 detorsrv007 6
[0] MPI startup(): 43 4180 detorsrv007 7
[0] MPI startup(): 44 12232 detorsrv007 8
[0] MPI startup(): 45 15528 detorsrv007 9
[0] MPI startup(): 46 11816 detorsrv007 10
[0] MPI startup(): 47 14224 detorsrv007 11
[0] MPI startup(): 48 15864 detorsrv007 12
[0] MPI startup(): 49 13064 detorsrv007 13
[0] MPI startup(): 50 13456 detorsrv007 14
[0] MPI startup(): 51 12496 detorsrv007 15
[0] MPI startup(): 52 11672 detorsrv007 16
[0] MPI startup(): 53 9300 detorsrv007 17
[0] MPI startup(): 54 14868 detorsrv007 18
[0] MPI startup(): 55 1044 detorsrv007 19
[0] MPI startup(): 56 16340 detorsrv007 20
[0] MPI startup(): 57 1160 detorsrv007 21
[0] MPI startup(): 58 9432 detorsrv007 22
[0] MPI startup(): 59 7764 detorsrv007 23
[0] MPI startup(): 60 13524 detorsrv007 24
[0] MPI startup(): 61 15128 detorsrv007 25
[0] MPI startup(): 62 14988 detorsrv007 26
[0] MPI startup(): 63 11896 detorsrv007 27
[0] MPI startup(): I_MPI_HYDRA_DEBUG=1
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_HYDRA_BSTRAP_KEEP_ALIVE=1
[0] MPI startup(): I_MPI_ADJUST_GATHERV=3
[0] MPI startup(): I_MPI_CBWR=2
[0] MPI startup(): I_MPI_DEBUG=500

Please see attached the full MPI debug output.

Using
-genv I_MPI_FABRICS shm
does not help to get it running 64 cores on 4 NUMA domains.

What can we do to properly run the simulation (it has to be 64 cores!)

Best regards

Frank

Labels (1)
0 Kudos
25 Replies
ShivaniK_Intel
Moderator
11,813 Views

Hi,


Thanks for posting in the Intel forums.


We are working on it and will get back to you soon.


Thanks & Regards

Shivani


0 Kudos
ShivaniK_Intel
Moderator
11,755 Views

Hi,


Could you please try the latest version of Intel MPI Library 2021.10 and let us know if you face a similar issue?


Could you also please provide us with the output of cpuinfo command?


Thanks & Regards

Shivani


0 Kudos
Frank_R_1
New Contributor I
11,727 Views

Hi,

 

We tried Intel MPI 2021.10 with the same behavior that only two numa domains are used.

"Warning - oversubscription detected: 64 processes will be placed on 54 cores"

 

Some of the cpuinfo output is below:

===== Processor composition =====
Processor name : Intel(R) Xeon(R) Gold 6254
Packages(sockets) : 4
Cores : 54                                                               <------------------- this makes me wondering:

Processors(CPUs) : 72
Cores per package : 13
Threads per core : 1

 

The full cpuinfo output:

############################################################################

Intel(R) processor family information utility, Version 2021.10 Build 20230619
Copyright (C) 2005-2023 Intel Corporation. All rights reserved.

===== Processor composition =====
Processor name : Intel(R) Xeon(R) Gold 6254
Packages(sockets) : 4
Cores : 54
Processors(CPUs) : 72
Cores per package : 13
Threads per core : 1

===== Processor identification =====
Processor Thread Id. Core Id. Package Id.
0 0 0 0
1 0 1 0
2 0 2 0
3 0 3 0
4 0 4 0
5 0 8 0
6 0 9 0
7 0 10 0
8 0 11 0
9 0 16 0
10 0 17 0
11 0 18 0
12 0 19 0
13 0 20 0
14 0 24 0
15 0 25 0
16 0 26 0
17 0 27 0
18 0 0 2
19 0 1 2
20 0 2 2
21 0 3 2
22 0 4 2
23 0 8 2
24 0 9 2
25 0 10 2
26 0 11 2
27 0 16 2
28 0 17 2
29 0 18 2
30 0 19 2
31 0 20 2
32 0 24 2
33 0 25 2
34 0 26 2
35 0 27 2
36 0 0 1
37 0 1 1
38 0 2 1
39 0 3 1
40 0 4 1
41 0 8 1
42 0 9 1
43 0 10 1
44 0 11 1
45 0 16 1
46 0 17 1
47 0 18 1
48 0 19 1
49 0 20 1
50 0 24 1
51 0 25 1
52 0 26 1
53 0 27 1
54 0 0 3
55 0 1 3
56 0 2 3
57 0 3 3
58 0 4 3
59 0 8 3
60 0 9 3
61 0 10 3
62 0 11 3
63 0 16 3
64 0 17 3
65 0 18 3
66 0 19 3
67 0 20 3
68 0 24 3
69 0 25 3
70 0 26 3
71 0 27 3
===== Placement on packages =====
Package Id. Core Id. Processors
0 0,1,2,3,4,8,9,10,11,16,17,18,19,20,24,25,26,27 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
2 0,1,2,3,4,8,9,10,11,16,17,18,19,20,24,25,26,27 18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
1 0,1,2,3,4,8,9,10,11,16,17,18,19,20,24,25,26,27 36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53
3 0,1,2,3,4,8,9,10,11,16,17,18,19,20,24,25,26,27 54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71

===== Cache sharing =====
Cache Size Processors
L1 32 KB no sharing
L2 1 MB no sharing
L3 24 MB (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17)(18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35)(36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53)(54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71)

############################################################################

 

Best regards

Frank

0 Kudos
ShivaniK_Intel
Moderator
11,703 Views

Hi,


Could you please try disabling pinning via I_MPI_PIN=off or try an explicit pinning list: I_MPI_PIN_PROCESSOR_LIST=0-15,18-33,36-51,54-69 and let us know the output?


Thanks & Regards

Shivani


0 Kudos
Frank_R_1
New Contributor I
11,689 Views

Hi,

 

Here is the output:

[0] MPI startup(): Run 'pmi_process_mapping' nodemap algorithm
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): shm segment size (122 MB per rank) * (64 local ranks) = 7830 MB total
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): selected platform: unknown
[0] MPI startup(): File "" not found
[0] MPI startup(): Load tuning file: "/tuning_skx_ofi.dat"
[0] MPI startup(): File "/tuning_skx_ofi.dat" not found
[0] MPI startup(): Looking for tuning file: "/tuning_generic_ofi_.dat"
[0] MPI startup(): Looking for tuning file: "/tuning_generic_ofi.dat"
[0] MPI startup(): File "/tuning_skx_ofi.dat" not found
[0] MPI startup(): File "" not found
[0] MPI startup(): Unable to read tuning file for ch4 level
[0] MPI startup(): File "" not found
[0] MPI startup(): Unable to read tuning file for net level
[0] MPI startup(): File "" not found
[0] MPI startup(): Unable to read tuning file for shm level
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): threading: num_pools: 1
[0] MPI startup(): threading: enable_sep: 0
[0] MPI startup(): threading: direct_recv: 0
[0] MPI startup(): threading: zero_op_flags: 0
[0] MPI startup(): threading: num_am_buffers: 0
[0] MPI startup(): tag bits available: 30 (TAG_UB value: 1073741823)
[0] MPI startup(): source bits available: 0 (Maximal number of rank: 0)
[0] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 10064 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 1 13204 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 2 11604 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 3 9552 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 4 14540 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 5 16120 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 6 12360 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 7 15520 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 8 10816 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 9 8004 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 10 15768 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 11 13884 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 12 2944 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 13 13896 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 14 9856 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 15 15616 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 16 12592 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 17 14816 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 18 2752 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 19 13148 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 20 3964 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 21 10180 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 22 14576 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 23 9412 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 24 12292 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 25 10052 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 26 3004 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 27 12232 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 28 15328 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 29 8940 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 30 12924 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 31 10172 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 32 2688 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 33 16372 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 34 14892 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 35 12928 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 36 16248 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 37 8164 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 38 12040 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 39 13760 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 40 10700 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 41 10208 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 42 10800 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 43 11600 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 44 3020 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 45 11836 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 46 13084 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 47 14908 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 48 10224 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 49 14680 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 50 13120 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 51 3224 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 52 16212 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 53 44 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 54 12444 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 55 9244 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 56 15352 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 57 3068 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 58 12972 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 59 14396 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 60 5108 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 61 14984 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 62 8944 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 63 12884 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): I_MPI_HYDRA_DEBUG=500
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_HYDRA_BSTRAP_KEEP_ALIVE=1
[0] MPI startup(): I_MPI_PIN=off
[0] MPI startup(): I_MPI_PIN_PROCESSOR_LIST=0-15,18-33,36-51,54-69
[0] MPI startup(): I_MPI_FABRICS=shm
[0] MPI startup(): I_MPI_ADJUST_GATHERV=3
[0] MPI startup(): I_MPI_CBWR=2
[0] MPI startup(): I_MPI_DEBUG=500

 

It only runs on process group 1 which has socket 1 and 3

process group 0 is not occupied.

 

How is it possible that core count 54 (3*18) differs from processor count 72 (4*18)?

And Cores per package : 13 seems also strange:

===== Processor composition =====
Processor name : Intel(R) Xeon(R) Gold 6254
Packages(sockets) : 4
Cores : 54

Processors(CPUs) : 72
Cores per package : 13
Threads per core : 1

 

We really need a workaround to get it running. Or at least a reason what went wrong.

Windows server 2022 is up to date and the taskmanager shows 72 cores on 4 numa domains (hyperthreading off).

 

Best regards

Frank

0 Kudos
Frank_R_1
New Contributor I
11,688 Views

Hi,

 

Attached you'll find the complete output.

 

Best regards

Frank

0 Kudos
Frank_R_1
New Contributor I
11,568 Views

Hi,

 

Is the problem of cpuinfo related to:

https://community.intel.com/t5/Intel-oneAPI-HPC-Toolkit/MPI-Hydra-and-pinning-issue-on-dual-socket-AMD/m-p/1511153

 

Another customer has same problems with Windows 10 Enterprise and dual socket Intel Xeon 2x36 Cores that only 32 cores are occupied.

 

Somehow cpuinfo and mpi library are not correctly recognizing the amount of cores (different from processes)

 

Best regards

Frank

0 Kudos
ShivaniK_Intel
Moderator
11,548 Views

Hi,


Could you please try disabling pinning via I_MPI_PIN=off or try an explicit pinning list: I_MPI_PIN_PROCESSOR_LIST=0-15,18-33,36-51,54-69 and let us know the output?


Please set either option not both at once.


Thanks & Regards

Shivani




0 Kudos
Frank_R_1
New Contributor I
11,527 Views

Hi,

 

Please find attached the full outputs.

 

Here the short output with I_MPI_PIN=off:

[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 14508 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 1 10000 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 2 8044 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 3 15116 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 4 1400 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 5 10232 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 6 10124 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 7 14472 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 8 10012 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 9 15724 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 10 15748 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 11 9528 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 12 12608 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 13 15664 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 14 4784 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 15 3824 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 16 14208 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 17 8536 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 18 14376 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 19 8852 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 20 16332 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 21 15612 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 22 11404 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 23 8012 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 24 10208 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 25 44 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 26 9260 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 27 13548 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 28 14820 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 29 14112 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 30 12016 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 31 14796 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 32 10148 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 33 9092 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 34 11416 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 35 15832 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 36 12684 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 37 14556 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 38 14480 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 39 15920 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 40 11236 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 41 8640 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 42 16108 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 43 16000 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 44 12632 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 45 12732 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 46 12868 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 47 3996 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 48 9976 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 49 11364 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 50 12728 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 51 10076 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 52 11200 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 53 12028 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 54 14036 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 55 13256 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 56 9852 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 57 16092 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 58 15328 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 59 12384 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 60 8200 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 61 9712 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 62 14792 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 63 13336 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): I_MPI_HYDRA_DEBUG=500
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_HYDRA_BSTRAP_KEEP_ALIVE=1
[0] MPI startup(): I_MPI_PIN=off
[0] MPI startup(): I_MPI_FABRICS=shm
[0] MPI startup(): I_MPI_ADJUST_GATHERV=3
[0] MPI startup(): I_MPI_CBWR=2
[0] MPI startup(): I_MPI_DEBUG=500

 

Here the short output with I_MPI_PIN_PROCESSOR_LIST=0-15,18-33,36-51,54-69:

IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[27] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[55] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[34] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[8] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[62] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[54] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[36] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[61] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[10] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[33] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[39] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[31] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[35] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[45] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[26] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[23] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[60] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[20] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[18] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[49] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[47] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[0] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[14] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[51] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[22] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[40] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[50] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[7] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[44] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[4] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[59] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[12] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[19] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[5] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[58] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[29] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[32] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[37] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[30] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[48] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[53] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[41] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[43] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[16] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[42] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[3] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[46] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[52] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[2] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[13] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[9] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[21] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[56] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[11] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[15] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[63] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[6] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[24] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[57] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[25] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range
[17] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 10028 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 1 15580 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 2 9996 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 3 2292 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 4 44 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 5 16204 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 6 12140 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 7 13752 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 8 9880 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 9 13784 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 10 9204 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 11 10232 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 12 13580 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 13 9300 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 14 15044 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 15 13456 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 16 15040 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 17 8924 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 18 9848 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 19 2664 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 20 9316 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 21 1420 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 22 6736 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 23 4252 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 24 12384 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 25 11388 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 26 11276 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 27 12588 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 28 3776 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 29 1412 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 30 9092 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 31 14400 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 32 6760 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 33 8004 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 34 14792 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 35 13760 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 36 2496 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 37 8664 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 38 2804 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 39 3868 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 40 15064 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 41 3976 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 42 11872 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 43 14544 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 44 8860 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 45 13600 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 46 14852 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 47 13008 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 48 13576 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 49 12684 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 50 13720 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 51 14356 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 52 15216 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 53 8048 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 54 12912 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 55 15176 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 56 12224 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 57 15832 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 58 5152 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 59 14888 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 60 11404 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 61 9176 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 62 4932 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): 63 12844 detorsrv007 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
[0] MPI startup(): I_MPI_HYDRA_DEBUG=500
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_HYDRA_BSTRAP_KEEP_ALIVE=1
[0] MPI startup(): I_MPI_PIN_PROCESSOR_LIST=0-15,18-33,36-51,54-69
[0] MPI startup(): I_MPI_PIN_RESPECT_CPUSET=off
[0] MPI startup(): I_MPI_FABRICS=shm
[0] MPI startup(): I_MPI_ADJUST_GATHERV=3
[0] MPI startup(): I_MPI_CBWR=2
[0] MPI startup(): I_MPI_DEBUG=500

 

What does this mean?

IPL WARN> ipl_pin_list_direct syntax error, 36-51 list member should be -1, single CPU number, or CPU number range

 

Best regards

Frank

0 Kudos
Frank_R_1
New Contributor I
11,459 Views

Hi,

 

Attached you'll find a complete cpu-z output of the machine if it helps you investigating the problem.

 

Best regards

Frank

0 Kudos
Frank_R_1
New Contributor I
11,431 Views

Hi,

 

Please have a look at the cpuinfo.exe tool from Intel MPI 2018.3

 

Intel(R) processor family information utility, Version 2018 Update 3 Build 20180411
Copyright (C) 2005-2018 Intel Corporation. All rights reserved.

===== Processor composition =====
Processor name : Intel(R) Xeon(R) Gold 6254
Packages(sockets) : 4
Cores : 72
Processors(CPUs) : 72
Cores per package : 18
Threads per core : 1

===== Processor identification =====
Processor Thread Id. Core Id. Package Id.
0 0 0 0
1 0 1 0
2 0 2 0
3 0 3 0
4 0 4 0
5 0 8 0
6 0 9 0
7 0 10 0
8 0 11 0
9 0 16 0
10 0 17 0
11 0 18 0
12 0 19 0
13 0 20 0
14 0 24 0
15 0 25 0
16 0 26 0
17 0 27 0
18 0 0 2
19 0 1 2
20 0 2 2
21 0 3 2
22 0 4 2
23 0 8 2
24 0 9 2
25 0 10 2
26 0 11 2
27 0 16 2
28 0 17 2
29 0 18 2
30 0 19 2
31 0 20 2
32 0 24 2
33 0 25 2
34 0 26 2
35 0 27 2
36 0 0 1
37 0 1 1
38 0 2 1
39 0 3 1
40 0 4 1
41 0 8 1
42 0 9 1
43 0 10 1
44 0 11 1
45 0 16 1
46 0 17 1
47 0 18 1
48 0 19 1
49 0 20 1
50 0 24 1
51 0 25 1
52 0 26 1
53 0 27 1
54 0 0 3
55 0 1 3
56 0 2 3
57 0 3 3
58 0 4 3
59 0 8 3
60 0 9 3
61 0 10 3
62 0 11 3
63 0 16 3
64 0 17 3
65 0 18 3
66 0 19 3
67 0 20 3
68 0 24 3
69 0 25 3
70 0 26 3
71 0 27 3
===== Placement on packages =====
Package Id. Core Id. Processors
0 0,1,2,3,4,8,9,10,11,16,17,18,19,20,24,25,26,27 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
2 0,1,2,3,4,8,9,10,11,16,17,18,19,20,24,25,26,27 18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
1 0,1,2,3,4,8,9,10,11,16,17,18,19,20,24,25,26,27 36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53
3 0,1,2,3,4,8,9,10,11,16,17,18,19,20,24,25,26,27 54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71

===== Cache sharing =====
Cache Size Processors
L1 32 KB no sharing
L2 1 MB no sharing
L3 24 MB (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17)(18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35)(36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53)(54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71)

 

This looks as expected!

 

Do you have any insights now what could went wrong with that mpi only occupy 32 cores from on processor group?

Could it be BIOS options concerning NUMA etc.?

 

At the moment we ran out of options what to do next...

 

Best regards

Frank

0 Kudos
Frank_R_1
New Contributor I
11,368 Views

Hi,

 

One of our customers have a dual socket Xeon Platinum 8360Y (2 x 36 cores) on Windows 10 for workstations

and also have strange behavior of Intel MPI 2021.7 library.

Upper two images show NUMA on (expected cpuinfo output) and the distribution of 64 cores to only 32 cores (oversubscription) in taskmanager

Lower two images show NUMA off (strange cpuinfo output) and the distribution of 64 cores to 64 cores (correct) in taskmanager

Cpu-info-numa on.PNG

Numa-on.PNG

Cpu-info-numa off.PNG

Numa-off.PNG

Can you explain whats going on here?

Is this somehow related to the quad socket problem?

Did any other user of Intel MPI 2021.x library reported such problems on Windows?

    

Best regards

Frank

0 Kudos
ShivaniK_Intel
Moderator
11,280 Views

Hi,


Thanks for sharing the details.


We are working on it and will get back to you soon.


Thanks & Regards

Shivani


0 Kudos
Frank_R_1
New Contributor I
11,055 Views

Hi,

 

Are there any new insights you got concerning this problem?

Is this a problem which needs a MPI library update or is there a workaround?

 

Thanks in advance and best regards

Frank

0 Kudos
ShivaniK_Intel
Moderator
10,874 Views

Hi,


Sorry for the delayed response. We are working on this issue internally and will get back to you when there is an update.


Thanks & Regards

Shivani



0 Kudos
ShivaniK_Intel
Moderator
10,797 Views

Hi,


Currently, there is no workaround for this issue, we are working on it. 

I can suggest disabling the Intel MPI process pinning (I_MPI_PIN=off) and letting the system distribute processes.


Could you please help us by providing an XML file generated via HWLOC lstopo utility.


Here are the steps to do that:


Download HWLOC release https://download.open-mpi.org/release/hwloc/v2.9/hwloc-win64-build-2.9.3.zip

  1. Unzip it in any place.
  2. Open the terminal and go to the Bin folder.
  3. Run "lstopo.exe my_topo.xml"
  4. Send my_topo.xml file back. We can use it to test that the new logic works fine on your machine.


Thanks & Regards

Shivani


0 Kudos
Frank_R_1
New Contributor I
10,777 Views

Hi,

 

We already tried I_MPI_PIN=off and is has no effect. 64 cores are pinned to 32 cores.

Please find attached the output my_topo.xml.

 

Best regards

Frank

0 Kudos
ShivaniK_Intel
Moderator
10,601 Views

Hi,


Thanks for reporting this issue.


The issue will be fixed in one of the next Intel MPI releases.


Thanks & Regards

Shivani


0 Kudos
Frank_R_1
New Contributor I
10,560 Views

Hi,

 

Is there any chance that this issue is fixed in the upcoming release of oneAPI?

Because it is very important for us to solve these problems at our customers to get it running on high core machines.

As there is no workaround, I am a little bit curious what exactly went wrong (Windows process groups, pinning etc.).

 

Best regards

Frank

0 Kudos
ShivaniK_Intel
Moderator
10,233 Views

Hi,

 

Sorry for the inconvenience, currently we are unable to provide a more specific timeline for this issue. We will provide updates if there are any.

 

Thanks & Regards

Shivani

0 Kudos
Reply