Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2186 Discussions

Bad Termination in Hello World MPI

SP_Van
Beginner
151 Views

Hello,

I am using a small (6 node) Intel Xeon Gold 6426Y based cluster. The nodes are running Windows Server 2022 Standard. Each node has 2 physical CPUs with 16 cores per CPU.

I have installed Visual Studio 2022 and the latest (June 2024) Intel OneAPI Base and HPC Toolkits.

I am using 2 nodes - node1 and node 2 - for testing if Intel MPI is working properly.

I have successfully compiled the Fortran Hello World MPI program in VS2022 and it works fine when launched on a single node. Here is an example output:

D:\MPI_Shared>mpiexec -n 4 mpitest
Hello World from process: 0 of 4
Hello World from process: 1 of 4
Hello World from process: 2 of 4
Hello World from process: 3 of 4

If I copy the mpitest.exe file to another node in the cluster and launch it just on that node, it also works just fine. 

I am running into a (very frustrating and time consuming) BAD TERMINATION error however when trying to run the test on node1 and node2 using the following command on node1:

D:\MPI_Shared>mpiexec -n 4 -ppn 2 -hosts localhost,node2 mpitest

========================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 2 PID 12752 RUNNING AT node2
= EXIT STATUS: -1073741515 (c0000135)
========================================================================

========================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 3 PID 896 RUNNING AT node2
= EXIT STATUS: -1073741515 (c0000135)
========================================================================

The strange thing is that the process on the node1 (localhost) does not echo Hello World to the screen even though the bad termination seems to occur only on node2.

I also tried the -v option to see if something obvious popped up (the actual names for node1 and node2 are DCCAN100APPB4 and DCCAN100APPB9):

D:\MPI_Shared>mpiexec -v -n 2 -ppn 1 -hosts localhost,dccan100appb9 mpitest
[mpiexec@DCCAN100APPB4] Launch arguments: D:\Intel\oneAPI\mpi\2021.13\bin\hydra_bstrap_proxy.exe --upstream-host DCCAN100APPB4 --upstream-port 65386 --pgid 0 --launcher powershell --launcher-number 0 --base-path D:\Intel\oneAPI\mpi\2021.13\bin --tree-width 2 --tree-level 1 --time-left -1 --launch-type 2 --debug --service_port 0 --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 564 D:\Intel\oneAPI\mpi\2021.13\bin\hydra_pmi_proxy.exe --usize -1 --auto-cleanup 1 --abort-signal 9
[mpiexec@DCCAN100APPB4] Launch arguments: C:\Windows\System32\WindowsPowerShell\v1.0\\powershell.exe Invoke-Command -ComputerName dccan100appb9 -ScriptBlock {& D:\Intel\oneAPI\mpi\2021.13\bin\hydra_bstrap_proxy.exe --upstream-host DCCAN100APPB4 --upstream-port 65386 --pgid 0 --launcher powershell --launcher-number 0 --base-path D:\Intel\oneAPI\mpi\2021.13\bin --tree-width 2 --tree-level 1 --time-left -1 --launch-type 2 --debug --service_port 0 --proxy-id 1 --node-id 1 --subtree-size 1 D:\Intel\oneAPI\mpi\2021.13\bin\hydra_pmi_proxy.exe --usize -1 --auto-cleanup 1 --abort-signal 9 }
[proxy:0:0@DCCAN100APPB4] pmi cmd from fd 476: cmd=init pmi_version=1 pmi_subversion=1
[proxy:0:0@DCCAN100APPB4] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@DCCAN100APPB4] pmi cmd from fd 476: cmd=get_maxes
[proxy:0:0@DCCAN100APPB4] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096
[proxy:0:0@DCCAN100APPB4] pmi cmd from fd 476: cmd=get_appnum
[proxy:0:0@DCCAN100APPB4] PMI response: cmd=appnum appnum=0
[proxy:0:0@DCCAN100APPB4] pmi cmd from fd 476: cmd=get_my_kvsname
[proxy:0:0@DCCAN100APPB4] PMI response: cmd=my_kvsname kvsname=kvs_23020_0
[proxy:0:0@DCCAN100APPB4] pmi cmd from fd 476: cmd=get kvsname=kvs_23020_0 key=PMI_process_mapping
[proxy:0:0@DCCAN100APPB4] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:0@DCCAN100APPB4] pmi cmd from fd 476: cmd=barrier_in

========================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 10072 RUNNING AT dccan100appb9
= EXIT STATUS: -1073741515 (c0000135)
========================================================================

========================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 10032 RUNNING AT localhost
= EXIT STATUS: -1 (ffffffff)
========================================================================

Connectivity and authentication between the nodes is not an issue and is confirmed by the successful tests below (hostname successfully runs on both the local and remote host)

D:\MPI_Shared>mpiexec -n 4 -ppn 2 -hosts localhost,node2 hostname
NODE1
NODE1
NODE2
NODE2

I can also run Intel's benchmarks and inter-node traffic is confirmed when I look at Resource Monitor. Here is an example output:

D:\>mpiexec -n 2 -ppn 1 -hosts localhost,node2 IMB-MPI1 pingpong
#----------------------------------------------------------------
# Intel(R) MPI Benchmarks 2021.8, MPI-1 part
#----------------------------------------------------------------
# Date : Thu Jul 4 19:57:56 2024
# Machine : Intel(R) 64 Family 6 Model 143 Stepping 8, GenuineIntel
# Release : 6.2.9200
# Version :
# MPI Version : 3.1
# MPI Thread Environment:


# Calling sequence was:

# IMB-MPI1 pingpong

# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#

# List of Benchmarks to run:

# PingPong

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 18.10 0.00
1 1000 18.21 0.05
2 1000 18.23 0.11
4 1000 18.23 0.22
8 1000 18.13 0.44
16 1000 17.63 0.91
32 1000 18.14 1.76
64 1000 18.23 3.51
128 1000 18.53 6.91
256 1000 63.26 4.05
512 1000 63.51 8.06
1024 1000 64.05 15.99
2048 1000 66.62 30.74
4096 1000 69.87 58.63
8192 1000 67.22 121.86
16384 1000 69.59 235.42
32768 1000 88.20 371.53
65536 640 128.09 511.64
131072 320 214.06 612.32
262144 160 368.86 710.68
524288 80 690.47 759.33
1048576 40 1318.48 795.29
2097152 20 2562.83 818.30
4194304 10 5038.48 832.45

# All processes entering MPI_Finalize

The benchmark also runs fine if I do it in reverse, i.e. run it on node2 and setup the hosts as localhost,node1.

The above would indicate that hardware and software on the nodes seem to play nice with Intel MPI, but the test Fortran MPI code which I have compiled with VS2022 does not run on multiple nodes although it executes without a problem if confined to the processors on a single node.

I would really appreciate some pointers here as I have spent many hours trying to figure out the problem without any resolution.

Thanks!

 

 

Labels (2)
0 Kudos
3 Replies
TobiasK
Moderator
66 Views

@SP_Van 

could you please try to run with the full path of mpitest?

Also I would highly recommend to not use 'localhost' in the list of hostnames, even though it seems to work in your case.

0 Kudos
SP_Van
Beginner
56 Views

Hi Tobias - here is the result (no change)

 

D:\MPI_Shared>mpiexec -n 4 -ppn 2 -hosts dccan100appb4,dccan100appb9 hostname
DCCAN100APPB4
DCCAN100APPB4
DCCAN100APPB9
DCCAN100APPB9

D:\MPI_Shared>mpiexec -n 4 -ppn 2 -hosts dccan100appb4,dccan100appb9 d:\mpi_shared\mpitest

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 2 PID 14484 RUNNING AT dccan100appb9
= EXIT STATUS: -1073741515 (c0000135)
===================================================================================

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 3 PID 12884 RUNNING AT dccan100appb9
= EXIT STATUS: -1073741515 (c0000135)
===================================================================================

 

The directory structure on both nodes is the same, both have a folder D:\MPI_Shared in which the executable is sitting. The folders are also shared for full access to all users on the cluster network. Just as a sanity check I tried running the exe on the remote machine by invoking it directly...

D:\MPI_Shared>\\dccan100appb9\mpi_shared\mpitest2
Hello World from process: 0 of 1

I also stripped all the MPI code out of the test program and just left 3 lines of Fortran which I recompiled and tried to run. Same bad termination error. So it has nothing to do with a problem library or .mod file etc. I suspect that mpiexec is unable to launch the .exe on the remote node, but am stumped as to why.

I also tried launching the process on the remote node in Powershell without any issue...

PS D:\> Invoke-Command -ComputerName dccan100appb9 -ScriptBlock {d:\mpi_shared\mpitest}
Hello World from process: 0 of 1

The only thing I am unsure of is which environment variables need to be active when running mpiexec on multiple hosts. Intel's installers do not play nice with Windows Server (at least in my case) and the typical execution of setvars.bat when a oneAPI command prompt is opened does not give the typical echo I get on my standalone Windows 10 machine. On the cluster, I have manually set the paths which I figured out were needed and set up the project in Visual Studio based on Intel's documentation. I assumed that since the .exe is generated  correctly and runs on one node, and mpiexec runs hostname and IMB-MPI1 on multiple hosts without issue, that no other configuration is necessary.

Thanks for your feedback.

 

 

 

 

0 Kudos
SP_Van
Beginner
25 Views

@TobiasK  hi Tobias. Is there anything else I can check to try and solve this problem?

Thanks!

0 Kudos
Reply