User specified pinning works on the first two hosts but doesn't appear to be correct for a third host. Problem might be that two of the hosts are 18 cores each and the third host is 8 cores.
RAM= 256GB LRDIMM 2400
Linux = Ubuntu 22
Hosts handel1 and elgar1 are using
Processor name : Intel(R) Xeon(R) E5-2697 v4
Host mirella1 uses
Processor name : Intel(R) Xeon(R) E5-2667 v4
Linux mirella 5.15.0-57-generic #63-Ubuntu SMP Thu Nov 24 13:43:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Each system uses X550-T2 linux "bonding" option Adaptive Load Balancing.
December oneapi installed on mirella:
art@mirella:~$ ifort --version
ifort (IFORT) 2021.8.0 20221119
Copyright (C) 1985-2022 Intel Corporation. All rights reserved.
art@mirella:~$ mpiexec --version
Intel(R) MPI Library for Linux* OS, Version 2021.8 Build 20221129 (id: 339ec755a1)
Copyright 2003-2022, Intel Corporation.
Handel1 and elgar1 have pmn;y oneapi MPI installed.
Installed on mirella1:
Thanks for posting in the Intel forums.
As you have mentioned you have been using Linux = Ubuntu 22, it is not supported version by Intel MPI. For more details regarding the system requirements please refer to the below link.
Could you please let us know whether you face any issues using the supported version of the OS?
Thanks & Regards
Installed ubuntu v20.04.5 on 4 host systems here: faure, handel, elgar, and mirella. Installed Intel's December updates (base and HPC). Ran tests using 1, 2, 3, and 4 hosts. The problem doesn't appear with just 1 and 2 hosts (both 18 core CPUs). But 3 and 4 hosts both display bad pinning starting with host 3 (8 core xeon) and also with host 4 (6 core xeon). The test problem runs to completion with very good wall clock timings.
The zip includes the test runs for two different sized versions of the same source code, the NPB FT. The makefile produces an executable for a specific number of MPI ranks with the problem size equally divided over all the ranks. The number of ranks is constrained to a power of 2.
Size CLASS C=
Size : 512x 512x 512
Iterations : 20
Size CLASS D =
Size : 2048x1024x1024
Iterations : 25
My hosts elgar and handel each have a single Xeon (Broadwell) with 18 cores each. Hyperthreading is disabled.
Host mirella is a single Xeon (Broadwell) with 8 cores.
Host mirella is a single current Xeon with 6 cores.
Tested system configurations:
1 host, 16 ranks host handel
2 hosts, 32 ranks hosts handel and elgar, 16 ranks each
3 hosts, 32 cores, hosts handel and elgar with 12 ranks each and hosts mirella with 8 ranks.
4 hosts, 32 cores, hosts handel and elgar with 9 ranks each, host mirella with 8 ranks and host faure with 6 ranks.
Directory of C:\cygwin64\home\art\ubuntu_20.4.5_runs
01/17/2023 08:56 PM 4,529 ft_16_1_host_class_C.txt
01/17/2023 08:56 PM 4,825 ft_16_1_host_class_D.txt
01/17/2023 08:56 PM 5,237 ft_32_2_host_class_C.txt
01/17/2023 08:56 PM 5,582 ft_32_2_host_class_D.txt
01/17/2023 08:56 PM 5,209 ft_32_3_host_class_C.txt
01/17/2023 08:56 PM 5,554 ft_32_3_host_class_D.txt
01/17/2023 08:56 PM 5,058 ft_32_4_host_class_C.txt
01/17/2023 08:56 PM 5,403 ft_32_4_host_class_D.txt
01/18/2023 02:10 PM 1,358 mirella_config.txt
9 File(s) 42,755 bytes
The file mirella_config.txt includes Ubuntu and Intel version numbers and a copy of the run script. Only the specific -host entries are adjusted to configure each test run. The runs use the same MPI command arguments.
I did make a non-MPI change in my bonding configuration from Adaptive Load Balancing to Round-Robin after a test revealed better throughput along with the ability of round-robin to use both its slave NICs when only two hosts are being tested. Adaptive Load Balancing kicks in on a 3 host system.
Here are the clock times for each test.
ft_16_1_host_class_C.txt: Time in seconds = 20.18
ft_16_1_host_class_D.txt: Time in seconds = 490.83
ft_32_2_host_class_C.txt: Time in seconds = 16.93
ft_32_2_host_class_D.txt: Time in seconds = 321.46
ft_32_3_host_class_C.txt: Time in seconds = 13.07
ft_32_3_host_class_D.txt: Time in seconds = 279.54
ft_32_4_host_class_C.txt: Time in seconds = 14.64
ft_32_4_host_class_D.txt: Time in seconds = 260.59
I'll give ubuntu 20.04 a try and let you know.
I used to experience (and reported) pinning problems like this using Centos 7 starting with Intel's 2019 MPI.
FYI Even with the pinning problem this test runs on 4 hosts (32 ranks) in about 302 sec. The Windows version doesn't have this pinning problem but its best run time on the same hardware and network is 370 sec.
Could you please confirm the correct CPU configuration for a host named Mirella you have mentioned two different CPU configurations.
Could you please provide us with the output from the cpuinfo command for each host involved?
Could you please run the sample hello world program using the below command line and provide us with the pinning details of all 4 hosts?
mpiifort $I_MPI_ROOT/test/test.f90 -o hello
Thanks & Regards
As we didn't hear back from you, Could you please provide the details that have been asked in my previous post so that we can investigate more on your issue?
Thanks & Regards
Mirella, handel and elgar are all E5-26xx v4 CPUs. Mirella has 8 cores. Handel and elgar have 18 cores and each has 256GB LRDIMM. Faure is a 6 core i5-12500 with 64GB DIMM. Hyperthreading is disabled and OMP_NUM_THREADS=1.
BTW each of these systems has a Windows 10 drive, and both of my examples run OK using Windows 10.
I ran "hello" using two different pinning selections:
-host handel1 -n 12 ./hello : \
-host elgar1 -n 12 ./hello : \
-host mirella1 -n 4 ./hello : \
-host faure1 -n 4 ./hello
-host handel1 -n 9 ./hello : \
-host elgar1 -n 9 ./hello : \
-host mirella1 -n 8 ./hello : \
-host faure1 -n 6 ./hello
The pinning problem appears for both cases and as in my own test code, hello ran OK.
Sorry for the delay... I've been doing some work converting my windows 10 compiler/MPI environment to the January update.
Could you please remove I_MPI_FABRICS=shm:tcp, as this is no longer a valid option?
For more details on how the interconnect layer is currently configured and controlled, please refer to the below link.
This is not relevant to the issue at hand, but it is an easy fix to remove a warning.
Could you please try each of the following scenarios and provide us with the output?
mpirun -genv I_MPI_DEBUG 5 -host mirella1 -n 8 ./hello
mpirun -genv I_MPI_DEBUG 5 -host faure1 -n 6 ./hello
mpirun -genv I_MPI_DEBUG 5 -host handel1 -n 12 ./hello : -host elgar1 -n 12 ./hello
mpirun -genv I_MPI_DEBUG 5 -host faure1 -n 6 ./hello : -host mirella1 -n 6 ./hello : -host elgar1 -n 6 ./hello : -host handel1 -n 6 ./hello
mpirun -genv I_MPI_DEBUG 5 -host faure1 -n 6 ./hello : -host mirella1 -n 8 ./hello : -host elgar1 -n 18 ./hello : -host handel1 -n 18 ./hello
Thanks & Regards
Thanks for your patience.
Could you please try setting the I_MPI_PLATFORM=auto and let us know if you face similar issues?
If your issue still persists please provide us with the output log
Thanks & Regards