- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am having difficulty running my MPI application across multiple nodes on a Windows cluster.
Configuration
This cluster was setup with Microsoft Azure. It is a head node and two compute nodes with a 40 Gb/s interconnect. Each node is running Windows Server 2016 datacenter with the 2016 HPC Pack installed.
I am launching the jobs using the Job Scheduler that comes with HPC Pack.
What I have tried
So far I have done the following:
- Configured each node with the MPI Run Time Library 2019 release 6
- The hydra process manager is started on each node, confirmed to have started on each
- I have done:
mpiexec -register
and provided the same username and password for all nodes (including head)
- Setup a shared drive that contains the executable and working directory I am trying to run ( M: )
- Turned the firewall off on each node
Results
I can launch my executable successfully on each of the nodes (from the head node). E.g. run 2 processes on a single node.
But when I try to run 2 processes across 2 nodes the launcher hangs. No error is reported and the job idles doing nothing. The execution looks like:
mpiexec -verbose -ppn 1 -n 2 -hosts compn000,compn001 hostname M:\Test_Submit_HPC\distribution\Program.exe
Here are the launch arguments:
[mpiexec@compn000] Launch arguments: C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.0.166\windows\mpi\intel64\bin//hydra_bstrap_proxy.exe --upstream-host compn000 --upstream-port 57643 --pgid 0 --launcher service --launcher-number 0 --base-path C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.0.166\windows\mpi\intel64\bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --service_port 0 --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 384 C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.0.166\windows\mpi\intel64\bin//hydra_pmi_proxy.exe --usize -1 --auto-cleanup 1 --abort-signal 9
I have tried putting the program executable in the same location on each nodes respective HDs but still no luck.
- Tags:
- General Support
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zander,
Thanks for reaching out to us.
In the mpiexec -verbose -ppn 1 -n 2 -hosts compn000,compn001 hostname M:\Test_Submit_HPC\distribution\Program.exe command
what does the hostname signify?
Could you provide the log info after setting environment variable I_MPI_DEBUG=5
Thanks
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Prasanth,
That 'hostname' might have been an artifact of my copying and pasting something incorrectly.
I just ran:
mpiexec -verbose -ppn 1 -n 2 -hosts compn000,compn001 -genv I_MPI_DEBUG=5 M:\Test_Submit_HPC\distribution\Program.exe
I believethe only output I get is from -verbose, which is:
[mpiexec@compn000] Launch arguments: C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.0.166\windows\mpi\intel64\bin//hydra_bstrap_proxy.exe --upstream-host compn000 --upstream-port 59522 --pgid 0 --launcher service --launcher-number 0 --base-path C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.0.166\windows\mpi\intel64\bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --service_port 0 --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 368 C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.0.166\windows\mpi\intel64\bin//hydra_pmi_proxy.exe --usize -1 --auto-cleanup 1 --abort-signal 9
Next, I ran (no -verbose):
mpiexec -genv I_MPI_DEBUG=5 -ppn 1 -n 2 -hosts compn000,compn001 M:\Test_Submit_HPC\distribution\Program.exe
and I there is no output on the command prompt or any log file in my working directory.
Thanks,
Zander
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zander,
We are transferring your issue to the concerned team.
Please once check the installation and prerequisite steps to see if you have missed any steps.
Thanks
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zander,
Would you please share the results of running:
mpiexec -validate -host compn000
and
mpiexec -validate -host compn001
from your head node?
Best,
Dunni
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page