Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
1895 Discussions

MPI competition between Parallel Studio and ONEAPI

ALaza1
Novice
703 Views

After installing parallel_studio_xe_2020_update4_professional_edition_setup.exe and w_mpi_oneapi_p_2021.1.1.88_offline.exe,   mpirt's hydra-MPI server was installed, and I need to do a separate install for oneAPI's version.

questions:

1) my guess is that only one of the two should be installed.

2) should both mpirt and oneAPI\mpi be in the same path? If so, maybe oneAPI\bin needs to appear before the mpirt path. I believe a PATH search starts with the first path on its list and proceeds to the end of the list.

C:\Users\art>echo %PATH%
C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64_win\mpirt;C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64_win\compiler;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Sysinternals;C:\Program Files (x86)\Intel\oneAPI\mpi\2021.1.1\bin

3) even using a login that is in group administrators, one has to use an admin command prompt to successfully avoid security blocks.

sincerely,

 

art

0 Kudos
7 Replies
PrasanthD_intel
Moderator
635 Views

Hi Art,


Sorry for the delay in response.


 1)my guess is that only one of the two should be installed.

==>Yes as both Parallel Studio and OneAPI have similar components you can use either one of them. The OneAPI HPC toolkit comes with the latest MPI version.


2)should both mpirt and oneAPI\mpi be on the same path?

==>Intel Components comes with a script/batch file to initialize the required environment for them to run. You can have multiple versions of MPI and use all of them by running the specified script file that comes with that version.

So we recommend not manually add them to the PATH as the script file will take care of it.

For MPI run the mpivars.bat in the bin folder for windows and mpivars.sh for Linux


3) even using a login that is in group administrators, one has to use an admin command prompt to successfully avoid security blocks.

==>You need not be an administrator to run in your local machine but for running across nodes make sure that you have the privileges.


Regards

Prasanth


ALaza1
Novice
623 Views

Hi Prasanth,

 

Intel's Fortran installer adds mpirt to the user's path probably to support the language's co-array feature. I've seen installs add the MPI path at the end of %PATH% after mpirt's entry. As far as I know, linking will use the first library that it finds in a path.

Using Intel's MPI install script may disable a previously installed MPI Library server. We probably don't want two different servers listening to the same port. It seems to me that earlier MPI servers don't support newer MPI library features and even refuse to run an app linked to a newer library.

I routinely make use of Intel's scripts using my own script to select the version of the compiler and MPI that I want to use. Because the scripts don't work well when I run them in a cygwin bash shell environment, I use desktop shortcuts to select the version (v18, v19, v20) and included a reference to start a cygwin terminal that I actually do my work in. This makes it somewhat convenient for me to move back/forth to linux systems.

for example

echo off

IF /i "%1"=="v20" (
"\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020\windows\bin\compilervars.bat" intel64
"%I_MPI_ONEAPI_ROOT%"\env\vars.bat
"C:\cygwin64\bin"\mintty.exe -i /Cygwin-Terminal.ico -
)

IF /i "%1"=="v19" (
"\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.7.216\windows\bin\compilervars.bat" intel64
"\Program Files (x86)\IntelSWTools\mpi\2019.7.216\intel64\bin\mpivars.bat"
"C:\cygwin64\bin"\mintty.exe -i /Cygwin-Terminal.ico -
)

IF /i "%1"=="v18" (
rem "\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.5.287\windows\bin\compilervars.bat" intel64
"\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020\windows\bin\compilervars.bat" intel64
"\Program Files (x86)\IntelSWTools\mpi\2018.5.287\intel64\bin\mpivars.bat"
"C:\cygwin64\bin"\mintty.exe -i /Cygwin-Terminal.ico -
)

echo on

Sorry I've had disappointing experience with the 2019 MPI library largely due to its ignoring user pinning instructions that I use to allocate a specific number of ranks to specified hosts. Performance is poor compared to the 2018 library. I still haven't decided whether v20 performs any better.

 

art

ALaza1
Novice
552 Views

I did notice an improvement in handling user pinning directives but there were a couple of cases that didn't look right. Sorry I got diverted by this past month's Exchange server issues. I also noticed a performance issue using 4 hosts.

My MPI configuration is likely smaller than a number of other users... just 4 hosts using an Intel supplied adpative load balancing feature to team a pair of 10gig-e NICs on each host. This approach to teaming only adds bandwidth when there are 3 or more hosts in use.

The example I'm using for this test is the NPB3.3/MPI ft.D case. With 32 ranks on 2 hosts, wall clock time is about 500 sec. On 4 linux hosts using the same number of ranks produces about 310 sec. At the moment my best Windows 10 times on the same hosts is about 360 but more usually around 400 sec. I've been spending some time checking my connections and switches. Bear in mind that my Linux times are using the same 10gig-e NICs also with adaptive load balancing and the same switches. These NICs are configured with their very own private subnet (no gateway). I've also been disabling various Windows Defender actions on this private subnet.

I'm going to re-test using the latest release.

regards,

art

ALaza1
Novice
532 Views

A pinning bug persists into this version oneAPI\mpi\2021.1.1. I've attached two runs where the pinning bug shows up when the first host has only 4 cores,  and the second host has a larger number. In this example I'm pinning 4 ranks/host. When the first host has more than 4 cores, there's no problem. The only difference in these two runs is the order of appearance of these 2 hosts.

I've included more data on the properties (cpuinfo, ipconfig) and specific versions. The compiler is installed only on one system, bubbles. I'm using parallel_studio_xe_2020_update4_professional_edition_setup and w_mpi_oneapi_p_2021.1.1.88_offline

All systems: hyperthreading is disabled. Systems ives, blaze and bubbles have 256GB each and system brahms has 64GB.  HThe current version of Windows 10 Pro is operating on each, and I do run using a Cygwin bash shell.

My 10gig-e uses Intel x550-t2 in adaptive load balance configuration. There's a single IP address for each team on its host that connects to a private subnet. Jumbo frames are enabled. Also based on an Intel recommendation: flow control is disabled and interrupt moderation rate is off.

Please see the attached good and bad runs and the specific host details.

Thanks,

Art

PrasanthD_intel
Moderator
564 Views

Hi Art,


As you said you can load the script of the specific MPI version and then run that specific MPI version.

Since you have updated to the latest version are you still facing the pinning issue you have reported previously?


Regards

Prasanth


PrasanthD_intel
Moderator
436 Views

Hi Art,


Thanks for providing the debug logs along with your system info.

We are working on it and will get back to you.

Apart from showing process id's as zero for the processes launched on those nodes, did the work supposed to be done by those specific processes/ranks also stalled?



Regards

Prasanth


ALaza1
Novice
430 Views

The processes ran to completion including producing correct results. This is the CLASS=C problem size. The incorrect pinning issue produces noticeably longer run times for the next larger CLASS=D problem size and also yields correct results. I can provide examples of the pinning problem on the CLASS=D problem if needed.

The problem's memory is equally divided by the assigned number of ranks.
CLASS=C, array size= 512x 512x 512 double precision complex, problem size = ~7.2GB.
CLASS=D, array size= 2048x1024x1024 double precision complex, problem size = ~116GB.

I forgot to report that paging is disabled on these hosts.

art

 

Reply