Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2237 Discussions

MPI Hydra and pinning issue on dual socket AMD

Kj8LmPqZ
Beginner
3,711 Views

Hi,

 

I have a Windows 11 (workstation) machine with 2 AMD EPYC 7763 CPUs. I have installed the latest MPI version via oneAPI (2021.10). I am testing the library running the command:

  • mpiexec.exe -n [number_of_cores] hostname

I  noticed that it works only for certain number cores:

  • 1-7: Ok
  • 8,16,32,64,128: fails
  • 15: Ok

I have not tested every possibility, but it seems a bit random. When it does not work, the message is:

 

 

PS C:\Program Files (x86)\Intel\oneAPI\mpi\2021.10.0\bin> .\mpiexec.exe -n 8 hostname 
[mpiexec@L2O2] check_downstream_work_complition (mpiexec.c:1303): downstream from host machine exited abnormally
[mpiexec@L2O2] check_downstream_work_complition (mpiexec.c:1307): trying to close other downstreams
[mpiexec@L2O2] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)
[mpiexec@L2O2] wmain (mpiexec.c:2275): assert (pg->intel.exitcodes != NULL) failed
[mpiexec@L2O2] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)

 

I noticed that changing the pinning allows MPI to work, but the performance are not there. In particular:

  • I_MPI_PIN_DOMAIN: value of 1 and 4 works, but numa does not.
  • I_MPI_PIN_PROCESSOR_LIST=allcores:map=bunch works, but bad performance

Finally, running cpuinfo gives surprising results (cores per package = 48? cores=96?):

 

=====  Processor composition  =====
Processor name    : AMD EPYC 7763 64-Core Processor                  
Packages(sockets) : 2
Cores             : 96
Processors(CPUs)  : 128
Cores per package : 48
Threads per core  : 1

=====  Processor identification  =====
Processor	Thread Id.	Core Id.	Package Id.
0       	0   		0   		0   
1       	0   		1   		0   
2       	0   		2   		0   
3       	0   		3   		0   
4       	0   		4   		0   
5       	0   		5   		0   
6       	0   		6   		0   
7       	0   		7   		0   
8       	0   		16  		0   
9       	0   		17  		0   
10      	0   		18  		0   
11      	0   		19  		0   
12      	0   		20  		0   
13      	0   		21  		0   
14      	0   		22  		0   
15      	0   		23  		0   
16      	0   		32  		0   
17      	0   		33  		0   
18      	0   		34  		0   
19      	0   		35  		0   
20      	0   		36  		0   
21      	0   		37  		0   
22      	0   		38  		0   
23      	0   		39  		0   
24      	0   		48  		0   
25      	0   		49  		0   
26      	0   		50  		0   
27      	0   		51  		0   
28      	0   		52  		0   
29      	0   		53  		0   
30      	0   		54  		0   
31      	0   		55  		0   
32      	0   		0   		1   
33      	0   		1   		1   
34      	0   		2   		1   
35      	0   		3   		1   
36      	0   		4   		1   
37      	0   		5   		1   
38      	0   		6   		1   
39      	0   		7   		1   
40      	0   		16  		1   
41      	0   		17  		1   
42      	0   		18  		1   
43      	0   		19  		1   
44      	0   		20  		1   
45      	0   		21  		1   
46      	0   		22  		1   
47      	0   		23  		1   
48      	0   		32  		1   
49      	0   		33  		1   
50      	0   		34  		1   
51      	0   		35  		1   
52      	0   		36  		1   
53      	0   		37  		1   
54      	0   		38  		1   
55      	0   		39  		1   
56      	0   		48  		1   
57      	0   		49  		1   
58      	0   		50  		1   
59      	0   		51  		1   
60      	0   		52  		1   
61      	0   		53  		1   
62      	0   		54  		1   
63      	0   		55  		1   
64      	0   		8   		0   
65      	0   		9   		0   
66      	0   		10  		0   
67      	0   		11  		0   
68      	0   		12  		0   
69      	0   		13  		0   
70      	0   		14  		0   
71      	0   		15  		0   
72      	0   		24  		0   
73      	0   		25  		0   
74      	0   		26  		0   
75      	0   		27  		0   
76      	0   		28  		0   
77      	0   		29  		0   
78      	0   		30  		0   
79      	0   		31  		0   
80      	0   		40  		0   
81      	0   		41  		0   
82      	0   		42  		0   
83      	0   		43  		0   
84      	0   		44  		0   
85      	0   		45  		0   
86      	0   		46  		0   
87      	0   		47  		0   
88      	0   		56  		0   
89      	0   		57  		0   
90      	0   		58  		0   
91      	0   		59  		0   
92      	0   		60  		0   
93      	0   		61  		0   
94      	0   		62  		0   
95      	0   		63  		0   
96      	0   		8   		1   
97      	0   		9   		1   
98      	0   		10  		1   
99      	0   		11  		1   
100     	0   		12  		1   
101     	0   		13  		1   
102     	0   		14  		1   
103     	0   		15  		1   
104     	0   		24  		1   
105     	0   		25  		1   
106     	0   		26  		1   
107     	0   		27  		1   
108     	0   		28  		1   
109     	0   		29  		1   
110     	0   		30  		1   
111     	0   		31  		1   
112     	0   		40  		1   
113     	0   		41  		1   
114     	0   		42  		1   
115     	0   		43  		1   
116     	0   		44  		1   
117     	0   		45  		1   
118     	0   		46  		1   
119     	0   		47  		1   
120     	0   		56  		1   
121     	0   		57  		1   
122     	0   		58  		1   
123     	0   		59  		1   
124     	0   		60  		1   
125     	0   		61  		1   
126     	0   		62  		1   
127     	0   		63  		1   
=====  Placement on packages  =====
Package Id.	Core Id.	Processors
0   		0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,32,33,34,35,36,37,38,39,48,49,50,51,52,53,54,55,8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31,40,41,42,43,44,45,46,47,56,57,58,59,60,61,62,63		0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95
1   		0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,32,33,34,35,36,37,38,39,48,49,50,51,52,53,54,55,8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31,40,41,42,43,44,45,46,47,56,57,58,59,60,61,62,63		32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127

=====  Cache sharing  =====
Cache	Size		Processors
L1	32  KB		no sharing
L2	512 KB		no sharing
L3	32  MB		(0,1,2,3,4,5,6,7)(8,9,10,11,12,13,14,15)(16,17,18,19,20,21,22,23)(24,25,26,27,28,29,30,31)(32,33,34,35,36,37,38,39)(40,41,42,43,44,45,46,47)(48,49,50,51,52,53,54,55)(56,57,58,59,60,61,62,63)(64,65,66,67,68,69,70,71)(72,73,74,75,76,77,78,79)(80,81,82,83,84,85,86,87)(88,89,90,91,92,93,94,95)(96,97,98,99,100,101,102,103)(104,105,106,107,108,109,110,111)(112,113,114,115,116,117,118,119)(120,121,122,123,124,125,126,127)

 

I am not the only one encountering issue with Intel MPI and this type of CPU. I have seen other posts but never a solution was mentioned. 

 

Labels (2)
0 Kudos
4 Replies
AishwaryaCV_Intel
Moderator
3,675 Views

Hi,


Thank you for posting in intel communities. 


We are looking into your issue, and will get back to you soon.


Thanks And Regards,

Aishwarya




0 Kudos
AishwaryaCV_Intel
Moderator
3,650 Views

Hi,

 

Could you please ensure that you are using the script to initialize the environment correctly?. I advise you to use the environment script and run "mpirunexe", rather than attempting to execute ".\mpirunexe.hydra" directly by going to the directory. This will help ensure the proper functioning of the environment and the program.

 

For example you can set it as below:

"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"

You can refer the following link : https://www.intel.com/content/www/us/en/docs/oneapi/user-guide-vs-code/2023-0/set-the-oneapi-environment-by-manually-running.html

 

 

Thanks And Regards,

Aishwarya

 

0 Kudos
Kj8LmPqZ
Beginner
3,596 Views

Hi,

 

Thank you for your answer. I tried with the mentioned procedure and obtained the exact same results.

I managed to obtain good performance for the BIBAND benchmark by finetuning the pinning on 64 cores, but it does not scale well. Additionally, pinning does not seem to be consistent with the simulator that you provide, but rather all over the place. 

 

Best regards,

 

0 Kudos
TobiasK
Moderator
2,121 Views

@Kj8LmPqZ 
If this is still relevant for you, can you please test 2021.12.1?
We did refactor our pinning infrastructure on Windows.

0 Kudos
Reply