- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I installed Intel MPI Library (2021.10.0.49373) on my head node (hostname=ATS) and the two compute nodes (ATS1 and ATS2). I installed and started the hydra_service on all of them using "hydra_service -install".
Before running the actual program I wish to run parallely, I tried to test the set-up using a simple program.
I started by running the simple command on the head node in powershell:
mpiexec -n 2 hostname
which gave the expected output:
ATS
ATS
The problem started after this, when I actually tried to test the compute nodes using:
mpiexec -n 2 -ppn 1 -hosts ATS1,ATS2 hostname
This command neither produced any error nor did it produce any output.
I tried running with -genv I_MPI_Debug=6 (and 1 and 5), but they also did not produce any output.
I even tried running it only on the head node with the -hosts command, but that too did not produce any output (mpiexec -n 1 -hosts ATS hostname).
I have no idea what I'm doing wrong. Anytime I try to run it with -hosts or -f hostfile, I just get a blank screen in cmd/powershell. I have to kill the process using Ctrl+C which produces the output:
[mpiexec@ATS] Sending Ctrl-C to processes as requested
[mpiexec@ATS] Press Ctrl-C again to force abort
I have checked in services on all the nodes that "impi_hydra_2021_10_0" is running.
Request your help in figuring out what is going wrong.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel communities.
Could you please provide your operating system, CPU and hardware details to reproduce your issue at our end?
Thanks & Regards,
Shaik Rabiya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for replying. Here's the information you asked for:
Head Node (ATS) - Windows 10 Pro - 64Bit OS , Intel Xeon X5670 Processor, 60GB RAM
Compute Node 1 (ATS1) - Windows 10 Pro - 64Bit OS , Intel Xeon X5670 Processor, 72GB RAM
Compute Node 2 (ATS2) - Windows 10 Pro - 64Bit OS , Intel Xeon X5670 Processor, 72GB RAM
Please let me know if any further info is required.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have informed the concerned development team. We will get back to you soon.
Thanks & Regards,
Shaik Rabiya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for your patience.
I'm assuming that you have gone through the following link for troubleshooting mpi applications, if not could you please try the steps mentioned in the following link:
>>>I tried running with -genv I_MPI_Debug=6 (and 1 and 5), but they also did not produce any output.
The windows powershell may be case sensitive. Could you please provide the output with both I_MPI_HYDRA_DEBUG=on and I_MPI_DEBUG=6?
Are you able to access both ATS1 and ATS2(compute nodes) via ssh from ATS(head node) or do you have a firewall in between? Could you also please confirm that you passwordless ssh is enabled or not and whether all required services are running other than impi_hydra_2021_10_0?
Thanks & Regards,
Shaik Rabiya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for the delay in response.
I'm not able to access ATS1 and ATS2 from ATS via ssh as all the nodes have only OpenSSH Client installed and not OpenSSH server. As the nodes are airgapped I'll need some time to have the SSH server installed on the compute nodes. I'll update once I'm able to do that.
>>> The windows powershell may be case sensitive. Could you please provide the output with both I_MPI_HYDRA_DEBUG=on and I_MPI_DEBUG=6?
This is the output when running with all caps flags:
mpiexec@ATS] Launch arguments: D:\Program Files (x86)\Intel\oneAPI\mpi\latest\bin\hydra_bstrap_proxy.exe --upstream-host ATS --upstream-port 64609 --pgid 0 --launcher powershell --launcher-number 0 --base-path D:\Program Files (x86)\Intel\oneAPI\mpi\latest\bin --tree-width 1 --tree-level 1 --time-left -1 --launch-type 2 --debug --service_port 0 --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 472 D:\Program Files (x86)\Intel\oneAPI\mpi\latest\bin\hydra_pmi_proxy.exe --usize -1 --auto-cleanup 1 --abort-signal 9
[proxy:0:0@ATS] HYD_spawn (..\windows\src\hydra_spawn.c:282): unable to create process I_MPI_DEBUG=6 -n 2 -ppn 1 -hosts ATS1,ATS2 hostname (error code 2)
[proxy:0:0@ATS] launch_processes (proxy.c:596): error creating process (error code 2). The system cannot find the file specified.
[proxy:0:0@ATS] main (proxy.c:969): error launching_processes
[mpiexec@ATS] check_downstream_work_complition (mpiexec.c:1303): downstream from host ATS exited abnormally
[mpiexec@ATS] check_downstream_work_complition (mpiexec.c:1307): trying to close other downstreams
[mpiexec@ATS] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)
[mpiexec@ATS] wmain (mpiexec.c:2275): assert (pg->intel.exitcodes != NULL) failed
[mpiexec@ATS] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. Could you please provide the details that we requested in the past reply?
Thanks & Regards,
Shaik Rabiya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are still working on your case. We will get back to you soon.
Thanks & Regards,
Shaik Rabiya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for your patience. Could you please try installing Intel MPI in a path which has no space. For example: C:\tmp\Intel\oneAPI as opposed to C:\Program Files (x86)\.... contains a space in the folder name of the directory path.
Could you please try and get back to us if you are still facing the same issue?
Thanks & Regards,
Shaik Rabiya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I tried your suggested solution of installing in a path without any spaces but it produced the same output.
[mpiexec@ATS] Launch arguments: D:\MPI\Intel\oneAPI\mpi\latest\bin\hydra_bstrap_proxy.exe --upstream-host ATS --upstream-port 56896 --pgid 0 --launcher powershell --launcher-number 0 --base-path D:\MPI\Intel\oneAPI\mpi\latest\bin --tree-width 1 --tree-level 1 --time-left -1 --launch-type 2 --debug --service_port 0 --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 500 D:\MPI\Intel\oneAPI\mpi\latest\bin\hydra_pmi_proxy.exe --usize -1 --auto-cleanup 1 --abort-signal 9
[proxy:0:0@ATS] HYD_spawn (..\windows\src\hydra_spawn.c:282): unable to create process I_MPI_DEBUG=6 -n 2 -ppn 1 -hosts ATS1,ATS2 hostname (error code 2)
[proxy:0:0@ATS] launch_processes (proxy.c:596): error creating process (error code 2). The system cannot find the file specified.
[proxy:0:0@ATS] main (proxy.c:969): error launching_processes
[mpiexec@ATS] wmain (mpiexec.c:2275): assert (pg->intel.exitcodes != NULL) failed
[mpiexec@ATS] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)
So, the changing of the installation path seemed to have no positive effect.
Looking for suggestions on how to proceed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm jumping in on this topic because I'm having the same exact problem, and am very eager to learn of a solution. Locally, hostname as well as my MPI application work fine. With MPI 2018 version, everything was fine local or remote. With the latest MPI release, a remote run with hostname fails. I have gone through the troubleshooting guide. I installed at a path without spaces.
edit: I've decided to start a new topic.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you please provide your CPU, OS and hardware details so that we could use that information to inspect your problem as well?
Thanks & Regards,
Shaik Rabiya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. Could you please try the workaround and mention your findings?
Thanks & Regards,
Shaik Rabiya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please follow these steps:
1. If you are using Intel MPI 2010 we advise you to use Intel MPI 2021.11 because there is a problem with paths containing spaces in the older version.
2. Could you please install and enable the winrm service (you can check if it is enabled with "get-service winrm" in PowerShell) if you haven't done this.
3. Add all relevant nodes to the list of TrustedHosts e.g. in Windows PowerShell with
Set-Item WSMan:\localhost\Client\TrustedHosts -Value "<comma-separated list of hosts>"
4. Test the visibility of the hosts in PowerShell with
Invoke-Command -ComputerName <comma-separated list of hosts> -ScriptBlock {hostname}
if the command above fails, it will mention what's missing. You will have to follow these instructions.
5. Once the above command above works, you can proceed to the next item.
Call setvars.bat from Intel MPI from cmd.exe not from PowerShell!
Test it from cmd.exe with the following command:
mpiexec -ppn 1 -hosts <comma-separated list of hosts> hostname
Here is a screenshot for your reference.
Thanks & Regards,
Shaik Rabiya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is the proposed solution meant for me or @JeffreyFaust ? Because I'm using the latest version of Intel MPI.
Regards
Avi7
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @RabiyaSK_Intel
I followed your instructions and completed steps 1-3 without any issues. However when I tried to run Step 4, I was met with the following error:
Command: Invoke-Command -ComputerName ATS, ATS1, ATS2 -ScriptBlock {hostname}
Output:
[ATS1] Connecting to remote server ATS1 failed with the following error message: WinRM client cannot process the request. The following error occured while using Kerberos authentication: Cannot find the computer ATS1. Verify that the computer exists on the network and that the name provided is spelled correctly.
[ATS2] Connecting to remote server ATS2 failed with the following error message: WinRM client cannot process the request. The following error occured while using Kerberos authentication: Cannot find the computer ATS2. Verify that the computer exists on the network and that the name provided is spelled correctly.
ATS
Same error message again for ATS2 as well.
As the host was able to succesfully ping ATS1 and ATS2, I knew there was no error in the computer/host name or the connectivity.
So after spending a lot of time googling I was able to solve the issue.
The command had to be modified as below:
Command: Invoke-Command -ComputerName ATS, ATS1, ATS2 -Credential Domain\Username -ScriptBlock {hostname}
This brings up a pop-up window like this where the password is to be entered
and we get the successful output as
ATS
ATS1
ATS2
So we need to provide the domain and username explicitly with the command line instructions.
Now, I tried to follow your remaining set of instructions of calling setvars.bat from cmd.exe and then executing mpiexec -ppn 1 -hosts ATS, ATS1, ATS2 hostname and ended up with the same error as before.
[proxy:0:0@ATS] HYD_spawn (..\windows\src\hydra_spawn.c:282): unable to create process ATS1,ATS2 hostname (error code 2)
[proxy:0:0@ATS] launch_processes (proxy.c:596): error creating process (error code 2). The system cannot find the file specified.
[proxy:0:0@ATS] main (proxy.c:969): error launching_processes
[mpiexec@ATS] check_downstream_work_complition (mpiexec.c:1303): downstream from host ATS exited abnormally
[mpiexec@ATS] check_downstream_work_complition (mpiexec.c:1307): trying to close other downstreams
[mpiexec@ATS] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)
[mpiexec@ATS] wmain (mpiexec.c:2275): assert (pg->intel.exitcodes != NULL) failed
[mpiexec@ATS] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)
So, back to the original error.
But based on the experience with the Invoke-Command, we probably need to provide domain and username along with the password to the mpiexec command. This was very easy to do with the mpiexec -register command earlier. But it produces the following error:
[mpiexec@ATS] match_arg (arg\hydra_arg.c:91): unrecognized argument register
[mpiexec@ATS] Similar arguments:
[mpiexec@ATS] rr
[mpiexec@ATS] r
[mpiexec@ATS] HYD_arg_parse_array (arg\hydra_arg.c:128): argument matching returned error
[mpiexec@ATS] mpiexec_get_parameters (mpiexec_params.c:1359): error parsing input array
[mpiexec@ATS] wmain (mpiexec.c:1893): error parsing parameters
The -register option is no longer recognized even though it is still listed in mpiexec -help. I feel this is what is preventing the command from successfully running and Intel MPI developers need to provide a quick patch or workaround to get it to work. Looking forward to an update on this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Avi7 wrote:But based on the experience with the Invoke-Command, we probably need to provide domain and username along with the password to the mpiexec command. This was very easy to do with the mpiexec -register command earlier. But it produces the following error:
[mpiexec@ATS] match_arg (arg\hydra_arg.c:91): unrecognized argument register
[mpiexec@ATS] Similar arguments:
[mpiexec@ATS] rr
[mpiexec@ATS] r
[mpiexec@ATS] HYD_arg_parse_array (arg\hydra_arg.c:128): argument matching returned error
[mpiexec@ATS] mpiexec_get_parameters (mpiexec_params.c:1359): error parsing input array
[mpiexec@ATS] wmain (mpiexec.c:1893): error parsing parameters
The -register option is no longer recognized even though it is still listed in mpiexec -help. I feel this is what is preventing the command from successfully running and Intel MPI developers need to provide a quick patch or workaround to get it to work. Looking forward to an update on this.
PS:
@RabiyaSK_Intel Searching for this error I found multiple other posts where users are facing the same issue, hope you can escalate it to the concerned team quickly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Avi7
We apologize for the confusion. The proposed solution is for all who are facing this problem addressing it in a general perspective. Please check and follow the other steps.
Thanks & Regards,
Shaik Rabiya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please try running Windows Powershell as administrator and check if you still need "Domain/ Username" option in Invoke Command?
Could you please try the above and check if you are still receiving errors.
>>>The -register option is no longer recognized even though it is still listed in mpiexec -help. I feel this is what is preventing the command from successfully running and Intel MPI developers need to provide a quick patch or workaround to get it to work. Looking forward to an update on this.
Thanks for providing this information, we will provide it to the concerned team.
Thanks & Regards,
Shaik Rabiya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you please try running Windows Powershell as administrator and check if you still need "Domain/ Username" option in Invoke Command?
I had run this command on Administrator Powershell as well. The result was the same. It only worked with -Credential Domain\Username flag added.
Regards
Avi7
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page