- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a Windows 11 (workstation) machine with 2 AMD EPYC 7763 CPUs. I have installed the latest MPI version via oneAPI (2021.10). I am testing the library running the command:
- mpiexec.exe -n [number_of_cores] hostname
I noticed that it works only for certain number cores:
- 1-7: Ok
- 8,16,32,64,128: fails
- 15: Ok
I have not tested every possibility, but it seems a bit random. When it does not work, the message is:
PS C:\Program Files (x86)\Intel\oneAPI\mpi\2021.10.0\bin> .\mpiexec.exe -n 8 hostname
[mpiexec@L2O2] check_downstream_work_complition (mpiexec.c:1303): downstream from host machine exited abnormally
[mpiexec@L2O2] check_downstream_work_complition (mpiexec.c:1307): trying to close other downstreams
[mpiexec@L2O2] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)
[mpiexec@L2O2] wmain (mpiexec.c:2275): assert (pg->intel.exitcodes != NULL) failed
[mpiexec@L2O2] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)
I noticed that changing the pinning allows MPI to work, but the performance are not there. In particular:
- I_MPI_PIN_DOMAIN: value of 1 and 4 works, but numa does not.
- I_MPI_PIN_PROCESSOR_LIST=allcores:map=bunch works, but bad performance
Finally, running cpuinfo gives surprising results (cores per package = 48? cores=96?):
===== Processor composition =====
Processor name : AMD EPYC 7763 64-Core Processor
Packages(sockets) : 2
Cores : 96
Processors(CPUs) : 128
Cores per package : 48
Threads per core : 1
===== Processor identification =====
Processor Thread Id. Core Id. Package Id.
0 0 0 0
1 0 1 0
2 0 2 0
3 0 3 0
4 0 4 0
5 0 5 0
6 0 6 0
7 0 7 0
8 0 16 0
9 0 17 0
10 0 18 0
11 0 19 0
12 0 20 0
13 0 21 0
14 0 22 0
15 0 23 0
16 0 32 0
17 0 33 0
18 0 34 0
19 0 35 0
20 0 36 0
21 0 37 0
22 0 38 0
23 0 39 0
24 0 48 0
25 0 49 0
26 0 50 0
27 0 51 0
28 0 52 0
29 0 53 0
30 0 54 0
31 0 55 0
32 0 0 1
33 0 1 1
34 0 2 1
35 0 3 1
36 0 4 1
37 0 5 1
38 0 6 1
39 0 7 1
40 0 16 1
41 0 17 1
42 0 18 1
43 0 19 1
44 0 20 1
45 0 21 1
46 0 22 1
47 0 23 1
48 0 32 1
49 0 33 1
50 0 34 1
51 0 35 1
52 0 36 1
53 0 37 1
54 0 38 1
55 0 39 1
56 0 48 1
57 0 49 1
58 0 50 1
59 0 51 1
60 0 52 1
61 0 53 1
62 0 54 1
63 0 55 1
64 0 8 0
65 0 9 0
66 0 10 0
67 0 11 0
68 0 12 0
69 0 13 0
70 0 14 0
71 0 15 0
72 0 24 0
73 0 25 0
74 0 26 0
75 0 27 0
76 0 28 0
77 0 29 0
78 0 30 0
79 0 31 0
80 0 40 0
81 0 41 0
82 0 42 0
83 0 43 0
84 0 44 0
85 0 45 0
86 0 46 0
87 0 47 0
88 0 56 0
89 0 57 0
90 0 58 0
91 0 59 0
92 0 60 0
93 0 61 0
94 0 62 0
95 0 63 0
96 0 8 1
97 0 9 1
98 0 10 1
99 0 11 1
100 0 12 1
101 0 13 1
102 0 14 1
103 0 15 1
104 0 24 1
105 0 25 1
106 0 26 1
107 0 27 1
108 0 28 1
109 0 29 1
110 0 30 1
111 0 31 1
112 0 40 1
113 0 41 1
114 0 42 1
115 0 43 1
116 0 44 1
117 0 45 1
118 0 46 1
119 0 47 1
120 0 56 1
121 0 57 1
122 0 58 1
123 0 59 1
124 0 60 1
125 0 61 1
126 0 62 1
127 0 63 1
===== Placement on packages =====
Package Id. Core Id. Processors
0 0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,32,33,34,35,36,37,38,39,48,49,50,51,52,53,54,55,8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31,40,41,42,43,44,45,46,47,56,57,58,59,60,61,62,63 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95
1 0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,32,33,34,35,36,37,38,39,48,49,50,51,52,53,54,55,8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31,40,41,42,43,44,45,46,47,56,57,58,59,60,61,62,63 32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127
===== Cache sharing =====
Cache Size Processors
L1 32 KB no sharing
L2 512 KB no sharing
L3 32 MB (0,1,2,3,4,5,6,7)(8,9,10,11,12,13,14,15)(16,17,18,19,20,21,22,23)(24,25,26,27,28,29,30,31)(32,33,34,35,36,37,38,39)(40,41,42,43,44,45,46,47)(48,49,50,51,52,53,54,55)(56,57,58,59,60,61,62,63)(64,65,66,67,68,69,70,71)(72,73,74,75,76,77,78,79)(80,81,82,83,84,85,86,87)(88,89,90,91,92,93,94,95)(96,97,98,99,100,101,102,103)(104,105,106,107,108,109,110,111)(112,113,114,115,116,117,118,119)(120,121,122,123,124,125,126,127)
I am not the only one encountering issue with Intel MPI and this type of CPU. I have seen other posts but never a solution was mentioned.
- Tags:
- AMD
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in intel communities.
We are looking into your issue, and will get back to you soon.
Thanks And Regards,
Aishwarya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please ensure that you are using the script to initialize the environment correctly?. I advise you to use the environment script and run "mpirunexe", rather than attempting to execute ".\mpirunexe.hydra" directly by going to the directory. This will help ensure the proper functioning of the environment and the program.
For example you can set it as below:
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
You can refer the following link : https://www.intel.com/content/www/us/en/docs/oneapi/user-guide-vs-code/2023-0/set-the-oneapi-environment-by-manually-running.html
Thanks And Regards,
Aishwarya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for your answer. I tried with the mentioned procedure and obtained the exact same results.
I managed to obtain good performance for the BIBAND benchmark by finetuning the pinning on 64 cores, but it does not scale well. Additionally, pinning does not seem to be consistent with the simulator that you provide, but rather all over the place.
Best regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Kj8LmPqZ
If this is still relevant for you, can you please test 2021.12.1?
We did refactor our pinning infrastructure on Windows.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page