Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

MPI error on Windows cluster

ako2
Beginner
424 Views

Hello everyone,

I installed ifx compiler and mpi libraries using the package intel-fortran-essentials-2025.2.1.6_offline_20250825_115647.
I am trying to run tests on a windows cluster. The program runs fine on a single host,
but hangs at MPI_BCAST when run on 2 hosts.

Test program:

PROGRAM winct
use mpi
implicit none
integer::ival,ierr,my_id,num_proc,len1
character(len=MPI_MAX_PROCESSOR_NAME)::procName
call MPI_Init(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, my_id, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, num_proc, ierr)
call MPI_GET_PROCESSOR_NAME(procName,len1,ierr)
ival=0
if (my_id==0)then
ival=10
endif
write(*,*)'Before BCAST: PE,procName:',my_id,procName,ival
! call mpi_barrier(MPI_COMM_WORLD, ierr)
call MPI_BCAST(ival,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)
write(*,*)'After BCAST: PE,procName:',my_id,procName,ival
call MPI_FINALIZE(ierr)
END PROGRAM winct

compilation:

"C:\Program Files (x86)\Intel\oneAPI\compiler\2025.2\bin\ifx" /nologo /O3 /I"C:\Program Files (x86)\Intel\oneAPI\mpi\2021.16\include\mpi" /traceback /libs:dll /threads /c /Qlocation,link,"C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.43.34808\bin\HostX64\x64" /Qm64 .\a.f90

Linking:

"C:\Program Files (x86)\Intel\oneAPI\compiler\2025.2\bin\ifx" /exe:"winct.exe" ... /Qoption,link,/LIBPATH:"C:\Program Files (x86)\Intel\oneAPI\mpi\2021.16\lib" /Qoption,link,/LIBPATH:"C:\Program Files (x86)\Intel\oneAPI\compiler\2022.1.0\windows\compiler\lib\intel64_win" ... /Qoption,link,/SUBSYSTEM:CONSOLE /IMPLIB: impi.lib /Qm64 "a.obj"


running (on HOST1)
mpiexec -n 4 -ppn 2 -hosts HOST1 -genv I_MPI_DEBUG=+2 winct.exe
is OK.

running (on HOST1)
mpiexec -n 4 -ppn 2 -hosts HOST2 -genv I_MPI_DEBUG=+2 winct.exe
is also OK.

running (on HOST1)
mpiexec -n 4 -ppn 2 -hosts HOST1,HOST2 -genv I_MPI_DEBUG=+2 winct.exe
produces the following output and hangs:

[0] MPI startup(): Intel(R) MPI Library, Version 2021.6 Build 20220227
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 2.1.0-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[0] MPI startup(): File "/tuning_skx_shm-ofi_tcp-ofi-rxm.dat" not found
[0] MPI startup(): Load tuning file: "/tuning_skx_shm-ofi.dat"
[0] MPI startup(): File "/tuning_skx_shm-ofi.dat" not found
[0] MPI startup(): File "/tuning_skx_shm-ofi.dat" not found
[0] MPI startup(): File "" not found
[0] MPI startup(): Unable to read tuning file for ch4 level
[0] MPI startup(): File "" not found
[0] MPI startup(): Unable to read tuning file for net level
[0] MPI startup(): File "" not found
[0] MPI startup(): Unable to read tuning file for shm level
[3#53840:8176@HOST2] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
[2#46416:23368@HOST2] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
[0#48236:24688@HOST1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
[1#44180:20120@HOST1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
Before BCAST: PE,procName: 3 HOST2 0
Before BCAST: PE,procName: 0 HOST1 10
Before BCAST: PE,procName: 1 HOST1 0
Before BCAST: PE,procName: 2 HOST2 0

Similar problem is encountered for other collective operations such as mpi_barrier, mpi_allgather etc.

Any suggestion would be greatly appreciated.
Thanks in advance.

Labels (2)
0 Kudos
4 Replies
ako2
Beginner
169 Views
Thank you for your response.
Initially the executable was not run from a shared location, but to try your suggestion I
placed the executable in a shared folder that is accesible from both hosts (HOST1 and HOST2).
However, it did not change anything. After creating the "mpi_profile" on HOST2,

running (on HOST1)
mpiexec -n 4 -ppn 2 -hosts HOST1,HOST2 -genv I_MPI_DEBUG=+2 -genv I_MPI_HYDRA_BOOTSTRAP_POWERSHELL_PSCNAME=mpi_profile I_MPI_AUTH_METHOD=delegate \\shared_folder_on_another_machine\winct.exe

we have the following error:
[mpiexec@HOST1] HYD_sock_connect (..\windows\src\hydra_sock.c:240): Retrying connection, retry_count=1, retries=0
[mpiexec@HOST1] HYD_connect_to_service (bstrap\service\service_launch.c:76): assert (!closed) failed
[mpiexec@HOST1] HYDI_bstrap_service_launch (bstrap\service\service_launch.c:319): unable to connect to hydra service (HOST2:8680)
[mpiexec@HOST1] remote_launch (bstrap\src\intel\i_hydra_bstrap.c:609): error launching bstrap proxy
[mpiexec@HOST1] single_launch (bstrap\src\intel\i_hydra_bstrap.c:667): remote launch error
[mpiexec@HOST1] launch_bstrap_proxies (bstrap\src\intel\i_hydra_bstrap.c:851): single launch error
[mpiexec@HOST1] HYD_bstrap_setup (bstrap\src\intel\i_hydra_bstrap.c:1045): unable to launch bstrap proxy
[mpiexec@HOST1] Error setting up the bootstrap proxies
[mpiexec@HOST1] Possible reasons:
[mpiexec@HOST1] 1. Host is unavailable. Please check that all hosts are available.
[mpiexec@HOST1] 2. Cannot launch hydra_bstrap_proxy.exe or it crashed on one of the hosts.
[mpiexec@HOST1] Make sure hydra_bstrap_proxy.exe is available on all hosts and it has right permissions.
[mpiexec@HOST1] 3. Firewall refused connection.
[mpiexec@HOST1] Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable.
[mpiexec@HOST1] 4. service bootstrap cannot launch processes on remote host.
[mpiexec@HOST1] You may try using -bootstrap option to select alternative launcher.

 

running WITHOUT "-genv I_MPI_AUTH_METHOD=delegate"; the program executes, but once again hangs at MPI_BCAST:



[0#8920:46648@HOST1] MPI startup(): Intel(R) MPI Library, Version 2021.16 Build 20250722
[0#8920:46648@HOST1] MPI startup(): Copyright (C) 2003-2025 Intel Corporation. All rights reserved.
[0#8920:46648@HOST1] MPI startup(): library kind: release
[0#8920:46648@HOST1] MPI startup(): libfabric version: 2.1.0-impi
[0#8920:46648@HOST1] MPI startup(): libfabric provider: tcp
[0#8920:46648@HOST1] MPI startup(): File "/tuning_skx_shm-ofi_tcp.dat" not found
[0#8920:46648@HOST1] MPI startup(): Load tuning file: "/tuning_skx_shm-ofi.dat"
[0#8920:46648@HOST1] MPI startup(): File "/tuning_skx_shm-ofi.dat" not found
[0#8920:46648@HOST1] MPI startup(): File "/tuning_skx_shm-ofi.dat" not found
[0#8920:46648@HOST1] MPI startup(): File "/skx_shm-ofi.json" not found
[0#8920:46648@HOST1] MPI startup(): Unable to read tuning file for ch4 level
[0#8920:46648@HOST1] MPI startup(): File "/skx_shm-ofi_network.json" not found
[0#8920:46648@HOST1] MPI startup(): Unable to read tuning file for net level
[0#8920:46648@HOST1] MPI startup(): File "/skx_shm-ofi_node.json" not found
[0#8920:46648@HOST1] MPI startup(): Unable to read tuning file for shm level
Before BCAST: PE,procName: 3 HOST2 0
Before BCAST: PE,procName: 2 HOST2 0
Before BCAST: PE,procName: 1 HOST1 0
Before BCAST: PE,procName: 0 HOST1 10

 

@jimdempseyatthecove recommended to experiment with the fabric selection.
When I use libfabric.dll from an older version of intel MPI, MPI_BCAST works as expected on 2 hosts, but this time MPI_FINALIZE fails:

 

running (on HOST1)
mpiexec -n 4 -ppn 2 -hosts HOST1,HOST2 -genv I_MPI_DEBUG=+2 -genv I_MPI_HYDRA_BOOTSTRAP_POWERSHELL_PSCNAME=mpi_profile \\shared_folder_on_another_machine\winct.exe

 

[0#39100:19156@HOST1] MPI startup(): Intel(R) MPI Library, Version 2021.16 Build 20250722
[0#39100:19156@HOST1] MPI startup(): Copyright (C) 2003-2025 Intel Corporation. All rights reserved.
[0#39100:19156@HOST1] MPI startup(): library kind: release
[0#39100:19156@HOST1] MPI startup(): libfabric version: 1.11.1a1-impi
[0#39100:19156@HOST1] MPI startup(): libfabric provider: tcp;ofi_rxm
[0#39100:19156@HOST1] MPI startup(): File "/tuning_skx_shm-ofi_tcp-ofi-rxm.dat" not found
[0#39100:19156@HOST1] MPI startup(): Load tuning file: "/tuning_skx_shm-ofi.dat"
[0#39100:19156@HOST1] MPI startup(): File "/tuning_skx_shm-ofi.dat" not found
[0#39100:19156@HOST1] MPI startup(): File "/tuning_skx_shm-ofi.dat" not found
[0#39100:19156@HOST1] MPI startup(): File "/skx_shm-ofi.json" not found
[0#39100:19156@HOST1] MPI startup(): Unable to read tuning file for ch4 level
[0#39100:19156@HOST1] MPI startup(): File "/skx_shm-ofi_network.json" not found
[0#39100:19156@HOST1] MPI startup(): Unable to read tuning file for net level
[0#39100:19156@HOST1] MPI startup(): File "/skx_shm-ofi_node.json" not found
[0#39100:19156@HOST1] MPI startup(): Unable to read tuning file for shm level
Before BCAST: PE,procName: 1 HOST1 0
Before BCAST: PE,procName: 3 HOST2 0
Before BCAST: PE,procName: 0 HOST1 10
Before BCAST: PE,procName: 2 HOST2 0
After BCAST: PE,procName: 0 HOST1 10
After BCAST: PE,procName: 2 HOST2 10
After BCAST: PE,procName: 1 HOST1 10
After BCAST: PE,procName: 3 HOST2 10
Abort(810649615) on node 2 (rank 2 in comm 0): Fatal error in internal_Finalize: Other MPI error, error stack:
internal_Finalize(39706).........: MPI_Finalize failed
MPII_Finalize(436)...............:
MPID_Finalize(1927)..............:
MPIDI_OFI_mpi_finalize_hook(1999):
MPIR_Reduce_intra_binomial(152)..:
MPIC_Send(129)...................:
MPID_Send(817)...................:
MPIDI_send_unsafe(109)...........:
MPIDI_OFI_send_normal(261).......:
MPIDI_OFI_send_handler_vni(502)..: OFI tagged send failed (ofi\ofi_impl.h:502:MPIDI_OFI_send_handler_vni:Unknown error)
[mpiexec@HOST1] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)
[mpiexec@HOST1] control_cb (mpiexec.c:1436): unable to send confirmation code
[mpiexec@HOST1] HYD_dmx_wait_for_event (..\windows\src\hydra_demux.c:216): callback returned error
[mpiexec@HOST1] wmain (mpiexec.c:1968): error waiting for event
This configuration also fails when, for instance, mpi_allgather is called in the same test program.
Please note that fabric selection does not matter when run on a single host. Everything works fine on a single host (i. e. mpiexec -n 4 -ppn 2 -hosts HOST1 \\shared_folder_on_another_machine\winct.exe ). 
0 Kudos
TobiasK
Moderator
135 Views

@ako2 please don't mix -delegate with PS remoting. You need to create the mpi_profile session on every host that is involved. Please also don't mix dll's

0 Kudos
ako2
Beginner
83 Views

@TobiasK thank you for your response.

I only mentioned that mpi_profile session was created on HOST2 because the reqirement seems to be that it should be created on remote host only. 

mpi_profile session was actually created on both hosts in my previous reply.

On HOST1 and HOST2

PS>Get-PSSessionConfiguration $sessionName

returns

Name : mpi_profile
PSVersion : 5.1
StartupScript :
RunAsUser : myusername
Permission : MYDOMAIN\myusername AccessAllowed

The program is launched on HOST1 using:

mpiexec -n 4 -ppn 2 -hosts HOST1,HOST2 -genv I_MPI_DEBUG=+2 -genv I_MPI_HYDRA_BOOTSTRAP_POWERSHELL_PSCNAME=mpi_profile \\shared_folder_on_another_machine\winct.exe

hangs at mpi_bcast. I am using the latest mpi libraries in this call. Mixing of the libfabric.dll was simply a side note I wanted to mention, hoping that it might help resolve the issue.

Regards,

 

 

 

 

 

0 Kudos
Reply