Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2274 토론

Intel MPI - mpiexec doesn't run with -hosts

Avi7
초급자
13,931 조회수

I installed Intel MPI Library (2021.10.0.49373) on my head node (hostname=ATS) and the two compute nodes (ATS1 and ATS2). I installed and started the hydra_service on all of them using "hydra_service -install".

Before running the actual program I wish to run parallely, I tried to test the set-up using a simple program.

I started by running the simple command on the head node in powershell:

mpiexec -n 2 hostname

which gave the expected output:

ATS

ATS

The problem started after this, when I actually tried to test the compute nodes using:

mpiexec -n 2 -ppn 1 -hosts ATS1,ATS2 hostname

This command neither produced any error nor did it produce any output.

I tried running with -genv I_MPI_Debug=6 (and 1 and 5), but they also did not produce any output.

I even tried running it only on the head node with the -hosts command, but that too did not produce any output (mpiexec -n 1 -hosts ATS hostname).

I have no idea what I'm doing wrong. Anytime I try to run it with -hosts or -f hostfile, I just get a blank screen in cmd/powershell. I have to kill the process using Ctrl+C which produces the output:

[mpiexec@ATS] Sending Ctrl-C to processes as requested

[mpiexec@ATS] Press Ctrl-C again to force abort

I have checked in services on all the nodes that "impi_hydra_2021_10_0" is running.

Request your help in figuring out what is going wrong.

 

 

0 포인트
24 응답
RabiyaSK_Intel
12,699 조회수

Hi,


Thanks for posting in Intel communities.


Could you please provide your operating system, CPU and hardware details to reproduce your issue at our end?


Thanks & Regards,

Shaik Rabiya


0 포인트
Avi7
초급자
12,685 조회수

Thanks for replying. Here's the information you asked for:

Head Node (ATS) - Windows 10 Pro - 64Bit OS , Intel Xeon X5670 Processor, 60GB RAM

Compute Node 1 (ATS1) - Windows 10 Pro - 64Bit OS , Intel Xeon X5670 Processor, 72GB RAM

Compute Node 2 (ATS2) - Windows 10 Pro - 64Bit OS , Intel Xeon X5670 Processor, 72GB RAM

 

Please let me know if any further info is required.

0 포인트
RabiyaSK_Intel
12,642 조회수

Hi,


We have informed the concerned development team. We will get back to you soon.


Thanks & Regards,

Shaik Rabiya


0 포인트
RabiyaSK_Intel
12,589 조회수

Hi,


Thank you for your patience.


I'm assuming that you have gone through the following link for troubleshooting mpi applications, if not could you please try the steps mentioned in the following link:

https://www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-windows/2021-10/troubleshooting.html


>>>I tried running with -genv I_MPI_Debug=6 (and 1 and 5), but they also did not produce any output.

The windows powershell may be case sensitive. Could you please provide the output with both I_MPI_HYDRA_DEBUG=on and I_MPI_DEBUG=6?


Are you able to access both ATS1 and ATS2(compute nodes) via ssh from ATS(head node) or do you have a firewall in between? Could you also please confirm that you passwordless ssh is enabled or not and whether all required services are running other than impi_hydra_2021_10_0?


Thanks & Regards,

Shaik Rabiya


0 포인트
Avi7
초급자
12,481 조회수

Sorry for the delay in response.

 

I'm not able to access ATS1 and ATS2 from ATS via ssh as all the nodes have only OpenSSH Client installed and not OpenSSH server. As the nodes are airgapped I'll need some time to have the SSH server installed on the compute nodes. I'll update once I'm able to do that.

 

>>> The windows powershell may be case sensitive. Could you please provide the output with both I_MPI_HYDRA_DEBUG=on and I_MPI_DEBUG=6?

 

This is the output when running with all caps flags:

 

mpiexec@ATS] Launch arguments: D:\Program Files (x86)\Intel\oneAPI\mpi\latest\bin\hydra_bstrap_proxy.exe --upstream-host ATS --upstream-port 64609 --pgid 0 --launcher powershell --launcher-number 0 --base-path D:\Program Files (x86)\Intel\oneAPI\mpi\latest\bin --tree-width 1 --tree-level 1 --time-left -1 --launch-type 2 --debug --service_port 0 --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 472 D:\Program Files (x86)\Intel\oneAPI\mpi\latest\bin\hydra_pmi_proxy.exe --usize -1 --auto-cleanup 1 --abort-signal 9
[proxy:0:0@ATS] HYD_spawn (..\windows\src\hydra_spawn.c:282): unable to create process I_MPI_DEBUG=6 -n 2 -ppn 1 -hosts ATS1,ATS2 hostname (error code 2)
[proxy:0:0@ATS] launch_processes (proxy.c:596): error creating process (error code 2). The system cannot find the file specified.

[proxy:0:0@ATS] main (proxy.c:969): error launching_processes
[mpiexec@ATS] check_downstream_work_complition (mpiexec.c:1303): downstream from host ATS exited abnormally
[mpiexec@ATS] check_downstream_work_complition (mpiexec.c:1307): trying to close other downstreams
[mpiexec@ATS] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)
[mpiexec@ATS] wmain (mpiexec.c:2275): assert (pg->intel.exitcodes != NULL) failed
[mpiexec@ATS] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)

0 포인트
RabiyaSK_Intel
12,520 조회수

Hi,


We have not heard back from you. Could you please provide the details that we requested in the past reply?


Thanks & Regards,

Shaik Rabiya


0 포인트
RabiyaSK_Intel
12,298 조회수

Hi,


We are still working on your case. We will get back to you soon.


Thanks & Regards,

Shaik Rabiya


0 포인트
RabiyaSK_Intel
12,188 조회수

Hi,

 

Thank you for your patience. Could you please try installing Intel MPI in a path which has no space. For example: C:\tmp\Intel\oneAPI as opposed to C:\Program Files (x86)\.... contains a space in the folder name of the directory path.

 

Could you please try and get back to us if you are still facing the same issue?

 

Thanks & Regards,


Shaik Rabiya

0 포인트
Avi7
초급자
12,002 조회수

Hi,

 

I tried your suggested solution of installing in a path without any spaces but it produced the same output.

 

[mpiexec@ATS] Launch arguments: D:\MPI\Intel\oneAPI\mpi\latest\bin\hydra_bstrap_proxy.exe --upstream-host ATS --upstream-port 56896 --pgid 0 --launcher powershell --launcher-number 0 --base-path D:\MPI\Intel\oneAPI\mpi\latest\bin --tree-width 1 --tree-level 1 --time-left -1 --launch-type 2 --debug --service_port 0 --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 500 D:\MPI\Intel\oneAPI\mpi\latest\bin\hydra_pmi_proxy.exe --usize -1 --auto-cleanup 1 --abort-signal 9
[proxy:0:0@ATS] HYD_spawn (..\windows\src\hydra_spawn.c:282): unable to create process I_MPI_DEBUG=6 -n 2 -ppn 1 -hosts ATS1,ATS2 hostname (error code 2)
[proxy:0:0@ATS] launch_processes (proxy.c:596): error creating process (error code 2). The system cannot find the file specified.

[proxy:0:0@ATS] main (proxy.c:969): error launching_processes
[mpiexec@ATS] wmain (mpiexec.c:2275): assert (pg->intel.exitcodes != NULL) failed
[mpiexec@ATS] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)

 

So, the changing of the installation path seemed to have no positive effect.

 

Looking for suggestions on how to proceed.

0 포인트
JeffreyFaust
초급자
12,152 조회수

Hello, 

 

I'm jumping in on this topic because I'm having the same exact problem, and am very eager to learn of a solution. Locally, hostname as well as my MPI application work fine. With MPI 2018 version, everything was fine local or remote. With the latest MPI release, a remote run with hostname fails. I have gone through the troubleshooting guide. I installed at a path without spaces.

 

edit: I've decided to start a new topic.

 

0 포인트
RabiyaSK_Intel
12,126 조회수

Hi @JeffreyFaust 

 

Could you please provide your CPU, OS and hardware details so that we could use that information to inspect your problem as well?

 

Thanks & Regards,

Shaik Rabiya

 

RabiyaSK_Intel
12,025 조회수

Hi,


We have not heard back from you. Could you please try the workaround and mention your findings?


Thanks & Regards,

Shaik Rabiya


0 포인트
RabiyaSK_Intel
11,865 조회수

Hi,

 

Could you please follow these steps:

 

1. If you are using Intel MPI 2010 we advise you to use Intel MPI 2021.11 because there is a problem with paths containing spaces in the older version.

 

2. Could you please install and enable the winrm service (you can check if it is enabled with "get-service winrm" in PowerShell) if you haven't done this.

 

3. Add all relevant nodes to the list of TrustedHosts e.g. in Windows PowerShell with

Set-Item WSMan:\localhost\Client\TrustedHosts -Value "<comma-separated list of hosts>"

 

4. Test the visibility of the hosts in PowerShell with
          

Invoke-Command -ComputerName <comma-separated list of hosts> -ScriptBlock {hostname}


if the command above fails, it will mention what's missing. You will have to follow these instructions.

 

5. Once the above command above works, you can proceed to the next item.


Call setvars.bat from Intel MPI from cmd.exe not from PowerShell!


Test it from cmd.exe with the following command:

 

mpiexec -ppn 1 -hosts <comma-separated list of hosts> hostname

 

 

 

Here is a screenshot for your reference.

RabiyaSK_Intel_0-1701424554142.png

 

Thanks & Regards,

Shaik Rabiya

 

0 포인트
Avi7
초급자
11,794 조회수

Hi @RabiyaSK_Intel 

 

Is the proposed solution meant for me or @JeffreyFaust ? Because I'm using the latest version of Intel MPI.

 

Regards

Avi7

0 포인트
Avi7
초급자
11,630 조회수

Hello @RabiyaSK_Intel 

 

I followed your instructions and completed steps 1-3 without any issues. However when I tried to run Step 4, I was met with the following error:

Command: Invoke-Command -ComputerName ATS, ATS1, ATS2 -ScriptBlock {hostname}

Output:

 

[ATS1] Connecting to remote server ATS1 failed with the following error message: WinRM client cannot process the request. The following error occured while using Kerberos authentication: Cannot find the computer ATS1. Verify that the computer exists on the network and that the name provided is spelled correctly.
[ATS2] Connecting to remote server ATS2 failed with the following error message: WinRM client cannot process the request. The following error occured while using Kerberos authentication: Cannot find the computer ATS2. Verify that the computer exists on the network and that the name provided is spelled correctly.
ATS

 

Same error message again for ATS2 as well.

As the host was able to succesfully ping ATS1 and ATS2, I knew there was no error in the computer/host name or the connectivity.

So after spending a lot of time googling I was able to solve the issue.

The command had to be modified as below:

Command: Invoke-Command -ComputerName ATS, ATS1, ATS2 -Credential Domain\Username -ScriptBlock {hostname}

This brings up a pop-up window like this where the password is to be entered

Avi7_2-1702271092665.png

and we get the successful output as

 

ATS
ATS1
ATS2

 

So we need to provide the domain and username explicitly with the command line instructions.

 

Now, I tried to follow your remaining set of instructions of calling setvars.bat from cmd.exe and then executing mpiexec -ppn 1 -hosts ATS, ATS1, ATS2 hostname  and ended up with the same error as before.

 

[proxy:0:0@ATS] HYD_spawn (..\windows\src\hydra_spawn.c:282): unable to create process ATS1,ATS2 hostname (error code 2)
[proxy:0:0@ATS] launch_processes (proxy.c:596): error creating process (error code 2). The system cannot find the file specified.

[proxy:0:0@ATS] main (proxy.c:969): error launching_processes
[mpiexec@ATS] check_downstream_work_complition (mpiexec.c:1303): downstream from host ATS exited abnormally
[mpiexec@ATS] check_downstream_work_complition (mpiexec.c:1307): trying to close other downstreams
[mpiexec@ATS] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)
[mpiexec@ATS] wmain (mpiexec.c:2275): assert (pg->intel.exitcodes != NULL) failed
[mpiexec@ATS] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)

 

So, back to the original error.

But based on the experience with the Invoke-Command, we probably need to provide domain and username along with the password to the mpiexec command. This was very easy to do with the mpiexec -register command earlier. But it produces the following error:

 

[mpiexec@ATS] match_arg (arg\hydra_arg.c:91): unrecognized argument register
[mpiexec@ATS] Similar arguments:
[mpiexec@ATS]     rr
[mpiexec@ATS]     r
[mpiexec@ATS] HYD_arg_parse_array (arg\hydra_arg.c:128): argument matching returned error
[mpiexec@ATS] mpiexec_get_parameters (mpiexec_params.c:1359): error parsing input array
[mpiexec@ATS] wmain (mpiexec.c:1893): error parsing parameters

 

The -register option is no longer recognized even though it is still listed in mpiexec -help. I feel this is what is preventing the command from successfully running and Intel MPI developers need to provide a quick patch or workaround to get it to work. Looking forward to an update on this.

0 포인트
Avi7
초급자
11,630 조회수

@Avi7 wrote:

But based on the experience with the Invoke-Command, we probably need to provide domain and username along with the password to the mpiexec command. This was very easy to do with the mpiexec -register command earlier. But it produces the following error:

 

 

[mpiexec@ATS] match_arg (arg\hydra_arg.c:91): unrecognized argument register
[mpiexec@ATS] Similar arguments:
[mpiexec@ATS]     rr
[mpiexec@ATS]     r
[mpiexec@ATS] HYD_arg_parse_array (arg\hydra_arg.c:128): argument matching returned error
[mpiexec@ATS] mpiexec_get_parameters (mpiexec_params.c:1359): error parsing input array
[mpiexec@ATS] wmain (mpiexec.c:1893): error parsing parameters

 

 

The -register option is no longer recognized even though it is still listed in mpiexec -help. I feel this is what is preventing the command from successfully running and Intel MPI developers need to provide a quick patch or workaround to get it to work. Looking forward to an update on this.


PS:

@RabiyaSK_Intel  Searching for this error I found multiple other posts where users are facing the same issue, hope you can escalate it to the concerned team quickly.

0 포인트
RabiyaSK_Intel
11,786 조회수

Hi @Avi7 

 

We apologize for the confusion. The proposed solution is for all who are facing this problem addressing it in a general perspective. Please check and follow the other steps.

 

Thanks & Regards,

Shaik Rabiya

 

0 포인트
RabiyaSK_Intel
11,619 조회수

Hi,


Could you please try running Windows Powershell as administrator and check if you still need "Domain/ Username" option in Invoke Command?


Could you please try the above and check if you are still receiving errors.


>>>The -register option is no longer recognized even though it is still listed in mpiexec -help. I feel this is what is preventing the command from successfully running and Intel MPI developers need to provide a quick patch or workaround to get it to work. Looking forward to an update on this.

Thanks for providing this information, we will provide it to the concerned team.


Thanks & Regards,

Shaik Rabiya


0 포인트
Avi7
초급자
11,601 조회수

Could you please try running Windows Powershell as administrator and check if you still need "Domain/ Username" option in Invoke Command?


I had run this command on Administrator Powershell as well. The result was the same. It only worked with -Credential Domain\Username flag added.

 

Regards

Avi7

0 포인트
Avi7
초급자
11,432 조회수
0 포인트
응답