Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2219 Discussions

Intel MPI - mpiexec doesn't run with -hosts

Avi7
Beginner
8,266 Views

I installed Intel MPI Library (2021.10.0.49373) on my head node (hostname=ATS) and the two compute nodes (ATS1 and ATS2). I installed and started the hydra_service on all of them using "hydra_service -install".

Before running the actual program I wish to run parallely, I tried to test the set-up using a simple program.

I started by running the simple command on the head node in powershell:

mpiexec -n 2 hostname

which gave the expected output:

ATS

ATS

The problem started after this, when I actually tried to test the compute nodes using:

mpiexec -n 2 -ppn 1 -hosts ATS1,ATS2 hostname

This command neither produced any error nor did it produce any output.

I tried running with -genv I_MPI_Debug=6 (and 1 and 5), but they also did not produce any output.

I even tried running it only on the head node with the -hosts command, but that too did not produce any output (mpiexec -n 1 -hosts ATS hostname).

I have no idea what I'm doing wrong. Anytime I try to run it with -hosts or -f hostfile, I just get a blank screen in cmd/powershell. I have to kill the process using Ctrl+C which produces the output:

[mpiexec@ATS] Sending Ctrl-C to processes as requested

[mpiexec@ATS] Press Ctrl-C again to force abort

I have checked in services on all the nodes that "impi_hydra_2021_10_0" is running.

Request your help in figuring out what is going wrong.

 

 

0 Kudos
24 Replies
RabiyaSK_Intel
Employee
7,446 Views

Hi,


Thanks for posting in Intel communities.


Could you please provide your operating system, CPU and hardware details to reproduce your issue at our end?


Thanks & Regards,

Shaik Rabiya


0 Kudos
Avi7
Beginner
7,432 Views

Thanks for replying. Here's the information you asked for:

Head Node (ATS) - Windows 10 Pro - 64Bit OS , Intel Xeon X5670 Processor, 60GB RAM

Compute Node 1 (ATS1) - Windows 10 Pro - 64Bit OS , Intel Xeon X5670 Processor, 72GB RAM

Compute Node 2 (ATS2) - Windows 10 Pro - 64Bit OS , Intel Xeon X5670 Processor, 72GB RAM

 

Please let me know if any further info is required.

0 Kudos
RabiyaSK_Intel
Employee
7,389 Views

Hi,


We have informed the concerned development team. We will get back to you soon.


Thanks & Regards,

Shaik Rabiya


0 Kudos
RabiyaSK_Intel
Employee
7,336 Views

Hi,


Thank you for your patience.


I'm assuming that you have gone through the following link for troubleshooting mpi applications, if not could you please try the steps mentioned in the following link:

https://www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-windows/2021-10/troubleshooting.html


>>>I tried running with -genv I_MPI_Debug=6 (and 1 and 5), but they also did not produce any output.

The windows powershell may be case sensitive. Could you please provide the output with both I_MPI_HYDRA_DEBUG=on and I_MPI_DEBUG=6?


Are you able to access both ATS1 and ATS2(compute nodes) via ssh from ATS(head node) or do you have a firewall in between? Could you also please confirm that you passwordless ssh is enabled or not and whether all required services are running other than impi_hydra_2021_10_0?


Thanks & Regards,

Shaik Rabiya


0 Kudos
Avi7
Beginner
7,228 Views

Sorry for the delay in response.

 

I'm not able to access ATS1 and ATS2 from ATS via ssh as all the nodes have only OpenSSH Client installed and not OpenSSH server. As the nodes are airgapped I'll need some time to have the SSH server installed on the compute nodes. I'll update once I'm able to do that.

 

>>> The windows powershell may be case sensitive. Could you please provide the output with both I_MPI_HYDRA_DEBUG=on and I_MPI_DEBUG=6?

 

This is the output when running with all caps flags:

 

mpiexec@ATS] Launch arguments: D:\Program Files (x86)\Intel\oneAPI\mpi\latest\bin\hydra_bstrap_proxy.exe --upstream-host ATS --upstream-port 64609 --pgid 0 --launcher powershell --launcher-number 0 --base-path D:\Program Files (x86)\Intel\oneAPI\mpi\latest\bin --tree-width 1 --tree-level 1 --time-left -1 --launch-type 2 --debug --service_port 0 --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 472 D:\Program Files (x86)\Intel\oneAPI\mpi\latest\bin\hydra_pmi_proxy.exe --usize -1 --auto-cleanup 1 --abort-signal 9
[proxy:0:0@ATS] HYD_spawn (..\windows\src\hydra_spawn.c:282): unable to create process I_MPI_DEBUG=6 -n 2 -ppn 1 -hosts ATS1,ATS2 hostname (error code 2)
[proxy:0:0@ATS] launch_processes (proxy.c:596): error creating process (error code 2). The system cannot find the file specified.

[proxy:0:0@ATS] main (proxy.c:969): error launching_processes
[mpiexec@ATS] check_downstream_work_complition (mpiexec.c:1303): downstream from host ATS exited abnormally
[mpiexec@ATS] check_downstream_work_complition (mpiexec.c:1307): trying to close other downstreams
[mpiexec@ATS] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)
[mpiexec@ATS] wmain (mpiexec.c:2275): assert (pg->intel.exitcodes != NULL) failed
[mpiexec@ATS] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)

0 Kudos
RabiyaSK_Intel
Employee
7,267 Views

Hi,


We have not heard back from you. Could you please provide the details that we requested in the past reply?


Thanks & Regards,

Shaik Rabiya


0 Kudos
RabiyaSK_Intel
Employee
7,045 Views

Hi,


We are still working on your case. We will get back to you soon.


Thanks & Regards,

Shaik Rabiya


0 Kudos
RabiyaSK_Intel
Employee
6,935 Views

Hi,

 

Thank you for your patience. Could you please try installing Intel MPI in a path which has no space. For example: C:\tmp\Intel\oneAPI as opposed to C:\Program Files (x86)\.... contains a space in the folder name of the directory path.

 

Could you please try and get back to us if you are still facing the same issue?

 

Thanks & Regards,


Shaik Rabiya

0 Kudos
Avi7
Beginner
6,749 Views

Hi,

 

I tried your suggested solution of installing in a path without any spaces but it produced the same output.

 

[mpiexec@ATS] Launch arguments: D:\MPI\Intel\oneAPI\mpi\latest\bin\hydra_bstrap_proxy.exe --upstream-host ATS --upstream-port 56896 --pgid 0 --launcher powershell --launcher-number 0 --base-path D:\MPI\Intel\oneAPI\mpi\latest\bin --tree-width 1 --tree-level 1 --time-left -1 --launch-type 2 --debug --service_port 0 --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 500 D:\MPI\Intel\oneAPI\mpi\latest\bin\hydra_pmi_proxy.exe --usize -1 --auto-cleanup 1 --abort-signal 9
[proxy:0:0@ATS] HYD_spawn (..\windows\src\hydra_spawn.c:282): unable to create process I_MPI_DEBUG=6 -n 2 -ppn 1 -hosts ATS1,ATS2 hostname (error code 2)
[proxy:0:0@ATS] launch_processes (proxy.c:596): error creating process (error code 2). The system cannot find the file specified.

[proxy:0:0@ATS] main (proxy.c:969): error launching_processes
[mpiexec@ATS] wmain (mpiexec.c:2275): assert (pg->intel.exitcodes != NULL) failed
[mpiexec@ATS] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)

 

So, the changing of the installation path seemed to have no positive effect.

 

Looking for suggestions on how to proceed.

0 Kudos
JeffreyFaust
Beginner
6,899 Views

Hello, 

 

I'm jumping in on this topic because I'm having the same exact problem, and am very eager to learn of a solution. Locally, hostname as well as my MPI application work fine. With MPI 2018 version, everything was fine local or remote. With the latest MPI release, a remote run with hostname fails. I have gone through the troubleshooting guide. I installed at a path without spaces.

 

edit: I've decided to start a new topic.

 

0 Kudos
RabiyaSK_Intel
Employee
6,873 Views

Hi @JeffreyFaust 

 

Could you please provide your CPU, OS and hardware details so that we could use that information to inspect your problem as well?

 

Thanks & Regards,

Shaik Rabiya

 

RabiyaSK_Intel
Employee
6,772 Views

Hi,


We have not heard back from you. Could you please try the workaround and mention your findings?


Thanks & Regards,

Shaik Rabiya


0 Kudos
RabiyaSK_Intel
Employee
6,612 Views

Hi,

 

Could you please follow these steps:

 

1. If you are using Intel MPI 2010 we advise you to use Intel MPI 2021.11 because there is a problem with paths containing spaces in the older version.

 

2. Could you please install and enable the winrm service (you can check if it is enabled with "get-service winrm" in PowerShell) if you haven't done this.

 

3. Add all relevant nodes to the list of TrustedHosts e.g. in Windows PowerShell with

Set-Item WSMan:\localhost\Client\TrustedHosts -Value "<comma-separated list of hosts>"

 

4. Test the visibility of the hosts in PowerShell with
          

Invoke-Command -ComputerName <comma-separated list of hosts> -ScriptBlock {hostname}


if the command above fails, it will mention what's missing. You will have to follow these instructions.

 

5. Once the above command above works, you can proceed to the next item.


Call setvars.bat from Intel MPI from cmd.exe not from PowerShell!


Test it from cmd.exe with the following command:

 

mpiexec -ppn 1 -hosts <comma-separated list of hosts> hostname

 

 

 

Here is a screenshot for your reference.

RabiyaSK_Intel_0-1701424554142.png

 

Thanks & Regards,

Shaik Rabiya

 

0 Kudos
Avi7
Beginner
6,541 Views

Hi @RabiyaSK_Intel 

 

Is the proposed solution meant for me or @JeffreyFaust ? Because I'm using the latest version of Intel MPI.

 

Regards

Avi7

0 Kudos
Avi7
Beginner
6,377 Views

Hello @RabiyaSK_Intel 

 

I followed your instructions and completed steps 1-3 without any issues. However when I tried to run Step 4, I was met with the following error:

Command: Invoke-Command -ComputerName ATS, ATS1, ATS2 -ScriptBlock {hostname}

Output:

 

[ATS1] Connecting to remote server ATS1 failed with the following error message: WinRM client cannot process the request. The following error occured while using Kerberos authentication: Cannot find the computer ATS1. Verify that the computer exists on the network and that the name provided is spelled correctly.
[ATS2] Connecting to remote server ATS2 failed with the following error message: WinRM client cannot process the request. The following error occured while using Kerberos authentication: Cannot find the computer ATS2. Verify that the computer exists on the network and that the name provided is spelled correctly.
ATS

 

Same error message again for ATS2 as well.

As the host was able to succesfully ping ATS1 and ATS2, I knew there was no error in the computer/host name or the connectivity.

So after spending a lot of time googling I was able to solve the issue.

The command had to be modified as below:

Command: Invoke-Command -ComputerName ATS, ATS1, ATS2 -Credential Domain\Username -ScriptBlock {hostname}

This brings up a pop-up window like this where the password is to be entered

Avi7_2-1702271092665.png

and we get the successful output as

 

ATS
ATS1
ATS2

 

So we need to provide the domain and username explicitly with the command line instructions.

 

Now, I tried to follow your remaining set of instructions of calling setvars.bat from cmd.exe and then executing mpiexec -ppn 1 -hosts ATS, ATS1, ATS2 hostname  and ended up with the same error as before.

 

[proxy:0:0@ATS] HYD_spawn (..\windows\src\hydra_spawn.c:282): unable to create process ATS1,ATS2 hostname (error code 2)
[proxy:0:0@ATS] launch_processes (proxy.c:596): error creating process (error code 2). The system cannot find the file specified.

[proxy:0:0@ATS] main (proxy.c:969): error launching_processes
[mpiexec@ATS] check_downstream_work_complition (mpiexec.c:1303): downstream from host ATS exited abnormally
[mpiexec@ATS] check_downstream_work_complition (mpiexec.c:1307): trying to close other downstreams
[mpiexec@ATS] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)
[mpiexec@ATS] wmain (mpiexec.c:2275): assert (pg->intel.exitcodes != NULL) failed
[mpiexec@ATS] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)

 

So, back to the original error.

But based on the experience with the Invoke-Command, we probably need to provide domain and username along with the password to the mpiexec command. This was very easy to do with the mpiexec -register command earlier. But it produces the following error:

 

[mpiexec@ATS] match_arg (arg\hydra_arg.c:91): unrecognized argument register
[mpiexec@ATS] Similar arguments:
[mpiexec@ATS]     rr
[mpiexec@ATS]     r
[mpiexec@ATS] HYD_arg_parse_array (arg\hydra_arg.c:128): argument matching returned error
[mpiexec@ATS] mpiexec_get_parameters (mpiexec_params.c:1359): error parsing input array
[mpiexec@ATS] wmain (mpiexec.c:1893): error parsing parameters

 

The -register option is no longer recognized even though it is still listed in mpiexec -help. I feel this is what is preventing the command from successfully running and Intel MPI developers need to provide a quick patch or workaround to get it to work. Looking forward to an update on this.

0 Kudos
Avi7
Beginner
6,377 Views

@Avi7 wrote:

But based on the experience with the Invoke-Command, we probably need to provide domain and username along with the password to the mpiexec command. This was very easy to do with the mpiexec -register command earlier. But it produces the following error:

 

 

[mpiexec@ATS] match_arg (arg\hydra_arg.c:91): unrecognized argument register
[mpiexec@ATS] Similar arguments:
[mpiexec@ATS]     rr
[mpiexec@ATS]     r
[mpiexec@ATS] HYD_arg_parse_array (arg\hydra_arg.c:128): argument matching returned error
[mpiexec@ATS] mpiexec_get_parameters (mpiexec_params.c:1359): error parsing input array
[mpiexec@ATS] wmain (mpiexec.c:1893): error parsing parameters

 

 

The -register option is no longer recognized even though it is still listed in mpiexec -help. I feel this is what is preventing the command from successfully running and Intel MPI developers need to provide a quick patch or workaround to get it to work. Looking forward to an update on this.


PS:

@RabiyaSK_Intel  Searching for this error I found multiple other posts where users are facing the same issue, hope you can escalate it to the concerned team quickly.

0 Kudos
RabiyaSK_Intel
Employee
6,533 Views

Hi @Avi7 

 

We apologize for the confusion. The proposed solution is for all who are facing this problem addressing it in a general perspective. Please check and follow the other steps.

 

Thanks & Regards,

Shaik Rabiya

 

0 Kudos
RabiyaSK_Intel
Employee
6,366 Views

Hi,


Could you please try running Windows Powershell as administrator and check if you still need "Domain/ Username" option in Invoke Command?


Could you please try the above and check if you are still receiving errors.


>>>The -register option is no longer recognized even though it is still listed in mpiexec -help. I feel this is what is preventing the command from successfully running and Intel MPI developers need to provide a quick patch or workaround to get it to work. Looking forward to an update on this.

Thanks for providing this information, we will provide it to the concerned team.


Thanks & Regards,

Shaik Rabiya


0 Kudos
Avi7
Beginner
6,348 Views

Could you please try running Windows Powershell as administrator and check if you still need "Domain/ Username" option in Invoke Command?


I had run this command on Administrator Powershell as well. The result was the same. It only worked with -Credential Domain\Username flag added.

 

Regards

Avi7

0 Kudos
Avi7
Beginner
6,179 Views
0 Kudos
Reply