Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2169 Discussions

Unable to run a remote cluster on windows--mpiexec hangs.

JeffreyFaust
Beginner
2,042 Views

Hello,

I am attempting to upgrade from 2018.2.185 to the latest release. Everything works as expected for a local run. Whenever there is more than one host involved, the mpiexec command hangs. It never launches the executable and never exits. It requires ctrl-C or similar to kill it.

OS: both on Windows 10 and Windows 11

Hardware: VM and workstation

MPI version: latest (2021.10 Build 20230619)

 

Here is output with some debugging flags:

C:\Users\jfaust>mpiexec -genv I_MPI_HYDRA_DEBUG=on -genv I_MPI_DEBUG=6 -n 2 -ppn 1 -hosts osg-mpi-02,osg-mpi-01 hostname
[mpiexec@osg-mpi-02] Launch arguments: C:\Program Files (x86)\Intel\oneAPI\mpi\latest\env\..\bin\hydra_bstrap_proxy.exe --upstream-host osg-mpi-02 --upstream-port 64075 --pgid 0 --launcher powershell --launcher-number 0 --base-path C:\Program Files (x86)\Intel\oneAPI\mpi\latest\env\..\bin --tree-width 2 --tree-level 1 --time-left -1 --launch-type 2 --debug --service_port 0 --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 536 C:\Program Files (x86)\Intel\oneAPI\mpi\latest\env\..\bin\hydra_pmi_proxy.exe --usize -1 --auto-cleanup 1 --abort-signal 9
[mpiexec@osg-mpi-02] Launch arguments: C:\WINDOWS\System32\WindowsPowerShell\v1.0\\powershell.exe Invoke-Command -ComputerName osg-mpi-01 -ScriptBlock { C:\Program Files (x86)\Intel\oneAPI\mpi\latest\env\..\bin\hydra_bstrap_proxy.exe --upstream-host osg-mpi-02 --upstream-port 64075 --pgid 0 --launcher powershell --launcher-number 0 --base-path C:\Program Files (x86)\Intel\oneAPI\mpi\latest\env\..\bin --tree-width 2 --tree-level 1 --time-left -1 --launch-type 2 --debug --service_port 0 --proxy-id 1 --node-id 1 --subtree-size 1 C:\Program Files (x86)\Intel\oneAPI\mpi\latest\env\..\bin\hydra_pmi_proxy.exe --usize -1 --auto-cleanup 1 --abort-signal 9 }

 

I don't know if this is related, but previously I have used mpiexec -register to register the username/password in the registry for cluster use. mpiexec -help still lists this as an option. When I attempt to run it (on 4 different machines now), I get the following:

 

C:\Windows\System32>mpiexec -register
[mpiexec@osg-mpi-01] match_arg (arg\hydra_arg.c:91): unrecognized argument register
[mpiexec@osg-mpi-01] Similar arguments:
[mpiexec@osg-mpi-01]     rr
[mpiexec@osg-mpi-01]     r
[mpiexec@osg-mpi-01] HYD_arg_parse_array (arg\hydra_arg.c:128): argument matching returned error
[mpiexec@osg-mpi-01] mpiexec_get_parameters (mpiexec_params.c:1359): error parsing input array
[mpiexec@osg-mpi-01] wmain (mpiexec.c:1893): error parsing parameters
0 Kudos
9 Replies
ShivaniK_Intel
Moderator
2,000 Views

Hi,


Thanks for posting in the Intel forums.


We are working on it and will get back to you soon.


Thanks & Regards

Shivani


0 Kudos
JeffreyFaust
Beginner
1,896 Views

Hi Shivani,

 

Any update on this? This has become a serious issue for our software and we want to understand if there will be a way forward.

 

Thank you,

 

-Jeff

0 Kudos
ShivaniK_Intel
Moderator
1,856 Views

Hi,


Could you please try these commands in Windows Powershell and mention your findings?


Set-Item WSMan:\localhost\Client\TrustedHosts -Value "<System-Name>"


Eg: 


Set-Item WSMan:\localhost\Client\TrustedHosts -Value "karangux-mobl"



Invoke-Command -ComputerName <System-Name> -ScriptBlock {hostname}


Eg:


Invoke-Command -ComputerName karangux-mobl -ScriptBlock {hostname}


Thanks & Regards,

Shivani



0 Kudos
JeffreyFaust
Beginner
1,836 Views

Here you go:

 

PS C:\> hostname
osg-mpi-01
PS C:\> Set-Item WSMan:\localhost\Client\TrustedHosts -Value "osg-mpi-02"

WinRM Security Configuration.
This command modifies the TrustedHosts list for the WinRM client. The computers in the TrustedHosts list might not be authenticated. The client might send
credential information to these computers. Are you sure that you want to modify this list?
[Y] Yes [N] No [S] Suspend [?] Help (default is "Y"):
PS C:\> Invoke-Command -ComputerName osg-mpi-02 -ScriptBlock {hostname}
osg-mpi-02

0 Kudos
ShivaniK_Intel
Moderator
1,772 Views

Hi,

 

Could you please follow these steps:

 

1. If you are using Intel MPI 2010 we advise you to use Intel MPI 2021.11 because there is a problem with paths containing spaces in the older version.

 

2. Could you please install and enable the winrm service (you can check if it is enabled with "get-service winrm" in PowerShell) if you haven't done this.

 

3. Add all relevant nodes to the list of TrustedHosts e.g. in Windows PowerShell with

Set-Item WSMan:\localhost\Client\TrustedHosts -Value "<comma-separated list of hosts>"

 

4. Test the visibility of the hosts in PowerShell with

Invoke-Command -ComputerName <comma-separated list of hosts> -ScriptBlock {hostname}

 

if the command above fails, it will mention what's missing. You will have to follow these instructions.

 

5. Once the above command above works, you can proceed to the next item.

 

Call setvars.bat from Intel MPI from cmd.exe not from PowerShell!

 

Test it from cmd.exe with the following command:

mpiexec -ppn 1 -hosts <comma-separated list of hosts> hostname

 

Here is a screenshot for your reference.

 

ShivaniK_Intel_0-1701755507089.png

 

 

Thanks & Regards

Shivani

 

 

0 Kudos
ShivaniK_Intel
Moderator
1,686 Views

Hi,


As we did not hear back from you could you please respond to my previous post?


Thanks & Regards

Shivani


0 Kudos
JeffreyFaust
Beginner
1,672 Views

Hi Shivani,

 

I was on travel for work last week, but my colleague continued to investigate. I looks like this is going to work for us, but I was waiting for our regression tests to run before calling this done. That should have been yesterday, but there was something wrong in our script, so I'm hoping to see results today.

 

-Jeff

 

0 Kudos
JeffreyFaust
Beginner
1,640 Views

Upgrading to the latest version fixed this for us. We have some other issues (performance and network folder access), but we will create new threads to address those as needed. Please consider this issue resolved.

 

Thank you,

 

-Jeff

0 Kudos
ShivaniK_Intel
Moderator
1,634 Views

Hi,


Glad to know that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Thanks & Regards

Shivani


0 Kudos
Reply