Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Executing MPI Exectuable on Azure Cluster

Mausolff__Zander
Beginner
1,243 Views

Hello,

I am having difficulty running my MPI application across multiple nodes on a Windows cluster. 

Configuration

This cluster was setup with Microsoft Azure.  It is a head node and two compute nodes with a 40 Gb/s interconnect.  Each node is running Windows Server 2016 datacenter with the 2016 HPC Pack installed.

I am launching the jobs using the Job Scheduler that comes with HPC Pack.

What I have tried

So far I have done the following:

- Configured each node with the MPI Run Time Library 2019 release 6

- The hydra process manager is started on each node, confirmed to have started on each

- I have done:

mpiexec -register

and provided the same username and password for all nodes (including head)

- Setup a shared drive that contains the executable and working directory I am trying to run ( M: )

- Turned the firewall off on each node

Results

I can launch my executable successfully on each of the nodes (from the head node).  E.g. run 2 processes on a single node. 

But when I try to run 2 processes across 2 nodes the launcher hangs.  No error is reported and the job idles doing nothing.  The execution looks like:

mpiexec  -verbose  -ppn 1 -n 2 -hosts compn000,compn001 hostname M:\Test_Submit_HPC\distribution\Program.exe 

Here are the launch arguments:

[mpiexec@compn000] Launch arguments: C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.0.166\windows\mpi\intel64\bin//hydra_bstrap_proxy.exe --upstream-host compn000 --upstream-port 57643 --pgid 0 --launcher service --launcher-number 0 --base-path C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.0.166\windows\mpi\intel64\bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --service_port 0 --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 384 C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.0.166\windows\mpi\intel64\bin//hydra_pmi_proxy.exe --usize -1 --auto-cleanup 1 --abort-signal 9 

I have tried putting the program executable in the same location on each nodes respective HDs but still no luck.

0 Kudos
5 Replies
PrasanthD_intel
Moderator
1,242 Views

Hi Zander,

Thanks for reaching out to us.

In the mpiexec  -verbose  -ppn 1 -n 2 -hosts compn000,compn001 hostname M:\Test_Submit_HPC\distribution\Program.exe command 

what does the hostname signify?

Could you provide the log info after setting environment variable  I_MPI_DEBUG=5

 

Thanks 

Prasanth

 

 

0 Kudos
Mausolff__Zander
Beginner
1,242 Views

Hello Prasanth,

 

That 'hostname' might have been an artifact of my copying and pasting something incorrectly.

I just ran:

mpiexec  -verbose  -ppn 1 -n 2 -hosts compn000,compn001 -genv I_MPI_DEBUG=5 M:\Test_Submit_HPC\distribution\Program.exe

 

I believethe only output I get is from -verbose, which is:

[mpiexec@compn000] Launch arguments: C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.0.166\windows\mpi\intel64\bin//hydra_bstrap_proxy.exe --upstream-host compn000 --upstream-port 59522 --pgid 0 --launcher service --launcher-number 0 --base-path C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.0.166\windows\mpi\intel64\bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --service_port 0 --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 368 C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.0.166\windows\mpi\intel64\bin//hydra_pmi_proxy.exe --usize -1 --auto-cleanup 1 --abort-signal 9

Next, I ran (no -verbose):

mpiexec -genv I_MPI_DEBUG=5  -ppn 1 -n 2 -hosts compn000,compn001  M:\Test_Submit_HPC\distribution\Program.exe

and I there is no output on the command prompt or any log file in my working directory.

Thanks,

Zander

0 Kudos
PrasanthD_intel
Moderator
1,242 Views

Hi Zander,

We are transferring your issue to the concerned team.

Please once check the installation and prerequisite steps to see if you have missed any steps. 

 

Thanks 

Prasanth

0 Kudos
Dunni_A_Intel
Moderator
1,242 Views

Hi Zander,

Would you please share the results of running:

mpiexec -validate -host compn000

and

mpiexec -validate -host compn001

from your head node?

Best,

Dunni

0 Kudos
Michael_Intel
Moderator
1,217 Views

This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only


0 Kudos
Reply