Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
The Intel sign-in experience is changing in February to support enhanced security controls. If you sign in, click here for more information.
1989 Discussions

Linux Intel(R) MPI Library, Version 2021.8 Build 20221129 (id: 339ec755a1) pinning problem

ALaza1
Novice
257 Views

User specified pinning works on the first two hosts but doesn't appear to be correct for a third host. Problem might be that two of the hosts are 18 cores each and the third host is 8 cores.

RAM= 256GB LRDIMM 2400

Linux = Ubuntu 22

Hosts handel1 and elgar1 are using

Processor name : Intel(R) Xeon(R) E5-2697 v4

Host mirella1 uses

Processor name : Intel(R) Xeon(R) E5-2667 v4

Linux mirella 5.15.0-57-generic #63-Ubuntu SMP Thu Nov 24 13:43:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Each system uses X550-T2  linux "bonding" option Adaptive Load Balancing.

December oneapi installed on mirella:

art@mirella:~$ ifort --version
ifort (IFORT) 2021.8.0 20221119
Copyright (C) 1985-2022 Intel Corporation. All rights reserved.

art@mirella:~$ mpiexec --version
Intel(R) MPI Library for Linux* OS, Version 2021.8 Build 20221129 (id: 339ec755a1)
Copyright 2003-2022, Intel Corporation.
Handel1 and elgar1 have pmn;y oneapi MPI installed.

Installed on mirella1:

l_HPCKit_p_2023.0.0.25400_offline.sh

l_BaseKit_p_2023.0.0.25537_offline.sh

 

Labels (1)
0 Kudos
6 Replies
ShivaniK_Intel
Moderator
231 Views

Hi,


Thanks for posting in the Intel forums.


As you have mentioned you have been using Linux = Ubuntu 22, it is not supported version by Intel MPI. For more details regarding the system requirements please refer to the below link.


https://www.intel.com/content/www/us/en/developer/articles/system-requirements/mpi-library-system-re...


Could you please let us know whether you face any issues using the supported version of the OS?


Thanks & Regards

Shivani



ALaza1
Novice
189 Views

Installed ubuntu v20.04.5 on 4 host systems here: faure, handel, elgar, and mirella. Installed Intel's December updates (base and HPC). Ran tests using 1, 2, 3, and 4 hosts. The problem doesn't appear with just 1 and 2 hosts (both 18 core CPUs). But 3 and 4 hosts both display bad pinning starting with host 3 (8 core xeon) and also with host 4  (6 core xeon). The test problem runs to completion with very good wall clock timings.

The zip includes the test runs for two different sized versions of the same source code, the NPB FT. The makefile produces an executable for a specific number of MPI ranks with the problem size equally divided over all the ranks. The number of ranks is constrained to a power of 2.

Size CLASS C=

Size : 512x 512x 512
Iterations : 20

 

Size CLASS D =

Size : 2048x1024x1024
Iterations : 25

My hosts elgar and handel each have a single Xeon (Broadwell) with 18 cores each. Hyperthreading is disabled.

Host mirella is a single Xeon (Broadwell) with 8 cores.

Host mirella is a single current Xeon with 6 cores.

Tested system configurations:

1 host, 16 ranks host handel

2 hosts, 32 ranks hosts handel and elgar, 16 ranks each

3 hosts, 32 cores, hosts handel and elgar with 12 ranks each and hosts mirella with 8 ranks.

4 hosts, 32 cores, hosts handel and elgar with 9 ranks each, host mirella with 8 ranks and host faure with 6 ranks.

 

Directory of C:\cygwin64\home\art\ubuntu_20.4.5_runs


01/17/2023 08:56 PM 4,529 ft_16_1_host_class_C.txt
01/17/2023 08:56 PM 4,825 ft_16_1_host_class_D.txt
01/17/2023 08:56 PM 5,237 ft_32_2_host_class_C.txt
01/17/2023 08:56 PM 5,582 ft_32_2_host_class_D.txt
01/17/2023 08:56 PM 5,209 ft_32_3_host_class_C.txt
01/17/2023 08:56 PM 5,554 ft_32_3_host_class_D.txt
01/17/2023 08:56 PM 5,058 ft_32_4_host_class_C.txt
01/17/2023 08:56 PM 5,403 ft_32_4_host_class_D.txt
01/18/2023 02:10 PM 1,358 mirella_config.txt
9 File(s) 42,755 bytes

The file mirella_config.txt includes Ubuntu and Intel version numbers and a copy of the run script. Only the specific -host entries are adjusted to configure each test run. The runs use the same MPI command arguments.

I did make a non-MPI change in my bonding configuration from Adaptive Load Balancing to Round-Robin after a test revealed better throughput along with the ability of round-robin to use both its slave NICs when only two hosts are being tested. Adaptive Load Balancing kicks in on a 3 host system.

Here are the clock times for each test.

ft_16_1_host_class_C.txt: Time in seconds = 20.18
ft_16_1_host_class_D.txt: Time in seconds = 490.83
ft_32_2_host_class_C.txt: Time in seconds = 16.93
ft_32_2_host_class_D.txt: Time in seconds = 321.46
ft_32_3_host_class_C.txt: Time in seconds = 13.07
ft_32_3_host_class_D.txt: Time in seconds = 279.54
ft_32_4_host_class_C.txt: Time in seconds = 14.64
ft_32_4_host_class_D.txt: Time in seconds = 260.59

regards,
Art

 

ALaza1
Novice
219 Views

I'll give ubuntu 20.04 a try and let you know.

I used to experience (and reported) pinning problems like this using Centos 7 starting with Intel's 2019 MPI.

FYI Even with the pinning problem this test runs on 4 hosts (32 ranks) in about 302 sec. The Windows version doesn't have this pinning problem but its best run time on the same hardware and network is  370 sec.

 

regards,

Art

ShivaniK_Intel
Moderator
151 Views

Hi,


Could you please confirm the correct CPU configuration for a host named Mirella you have mentioned two different CPU configurations.


Could you please provide us with the output from the cpuinfo command for each host involved?


Could you please run the sample hello world program using the below command line and provide us with the pinning details of all 4 hosts?


mpiifort $I_MPI_ROOT/test/test.f90 -o hello


Thanks & Regards

Shivani


ShivaniK_Intel
Moderator
108 Views


Hi,


As we didn't hear back from you, Could you please provide the details that have been asked in my previous post so that we can investigate more on your issue?


Thanks & Regards

Shivani


ALaza1
Novice
23 Views

Mirella, handel and elgar are all E5-26xx v4 CPUs. Mirella has 8 cores. Handel and elgar have 18 cores and each has 256GB LRDIMM.  Faure is a 6 core i5-12500 with 64GB DIMM. Hyperthreading is disabled and OMP_NUM_THREADS=1.

BTW each of these systems has a Windows 10 drive, and both of my examples run OK using Windows 10.

I ran "hello" using two different pinning selections:
1)
-host handel1 -n 12 ./hello : \
-host elgar1 -n 12 ./hello : \
-host mirella1 -n 4 ./hello : \
-host faure1 -n 4 ./hello

2)
-host handel1 -n 9 ./hello : \
-host elgar1 -n 9 ./hello : \
-host mirella1 -n 8 ./hello : \
-host faure1 -n 6 ./hello
The pinning problem appears for both cases and as in my own test code, hello ran OK.

Sorry for the delay... I've been doing some work converting my windows 10 compiler/MPI environment to the January update.

Regards,

Art

Reply