Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Unable to run mpi across two hosts

Kylynn
초급자
4,750 조회수

Here is my issue.

I got two machines(windows), they have the same windows username and password, and I have done this:

         a.  installed the intel mpi library on each of them.

         b.  run setvars.bat

         c.  do hydra_service -intall  & -start

         d.  do mpiexec -register

Then I want to run the test mpi program:

1. when I run mpiexec -validate, both hosts show success :

Kylynn_0-1649662078806.png

2. when I run mpi test program on each of them , both success separately:

Kylynn_1-1649662175401.pngKylynn_2-1649662192999.png

(the log is different, but the file and its file path is the same)

3. the problem is , when I run mpi across the two hosts, it just hangs  :

Kylynn_3-1649662303590.png

4. when I open the debug, it shows :

Kylynn_4-1649662356072.png

 

It seem that MPI init is not complete, and just stuck there.  what can I do for that ? 

help ~~~~

0 포인트
8 응답
Kylynn
초급자
4,728 조회수

when I compare the debug info with the successful one ( run on just one host), it seems that the hangs one stuck in MPI startup(), like here:

Kylynn_0-1649674001772.png

 

0 포인트
VarshaS_Intel
중재자
4,699 조회수

Hi,

 

Thanks for posting in Intel Communities.

 

Could you please let us know if you are able to run the below command?

mpiexec -n 2 -ppn 1 -hosts host1,host2 hostname

 

Could you please provide us with the cluster details you are using? And also, could you please confirm whether the firewall is disabled across all the nodes in the cluster?

 

We recommend you go through the below link. As it is not an updated article, you can ignore the product's versions.

 

https://www.intel.com/content/www/us/en/developer/articles/training/micro-cluster-setup-with-intel-mpi-a-stepwise-guide-to-setting-up-the-smallest-windows.html

 

Please make sure that you are following all the guidelines mentioned in the above link.

 

Thanks & Regards,

Varsha

 

0 포인트
Kylynn
초급자
4,679 조회수

Thank you for your reply (^ ^)~

Kylynn_0-1649841261005.png

For this command , it runs fine.

The article you mentioned I have read it before,  and I actually do as it says.  

Here is my updated situation:

1. I have two computers with IP 192.168.2.245  and 192.168.2.141. They are running Windows OS, and have the same username and password.

2. I have installed the Intel mpi library on both computers, and start the hydra_service ,  do mpiexec -register .

3. Then I run the test code which in the Intel mpi library I downloaded.  The content of the test code is that rank 0 is receiving the messages, and the other processes are sending messages. As you know , the hello world test code .

4. But it just hangs like I posted before, today I have done some modifications about the test code. I make rank 0 send messages, the other processes receive the messages.  Then It can run , give the right output about hello world , But , the new problem is , it hangs in the MPI_Finalize(). 

 

But when I do this on the other two computers which are running Linux OS, it works fine, so the problem only appears on the Windows OS.  ~~o(T.T)o~~ 

Would you please give me some tips about it , Thank you !

0 포인트
VarshaS_Intel
중재자
4,628 조회수

Hi,

 

Thanks for providing the information.

 

Could you please provide us with the results of the "-check_mpi" flag by using the below command?

 

mpiicc -check_mpi <program name.cpp> -o <program name>
mpiexec -n 2 ./<program name>

Could you please confirm whether you have followed Step 7(having the shared working directory) in this link?

 

Could you please let us know how you have connected the windows systems? And also, could you please provide us with the sample reproducer code to investigate more on your issue?

 

Thanks & Regards,

Varsha

 

0 포인트
VarshaS_Intel
중재자
4,593 조회수

Hi,


We have not heard back from you. Could you please provide us with the details mentioned in the previous reply?


Thanks & Regards,

Varsha


0 포인트
Kylynn
초급자
4,577 조회수

Dear Varsha ~

 

Recently I find other two machines(windows), and do these step on them like before. The result is , it works! MPI can run across them. But I still don't understand why it can not work on my computer.

 

Anyway, Thank you for your help!   (^.^)

       

0 포인트
VarshaS_Intel
중재자
4,546 조회수

Hi,


Thanks for the update.


If you need further assistance from us, could you please provide us with the details mentioned in the previous post? If not, can we go ahead and close this thread from our end?


Thanks & Regards,

Varsha


0 포인트
VarshaS_Intel
중재자
4,532 조회수

Hi,


We have not heard back from you. This thread will no longer be monitored by Intel, If you need further assistance, please post a new question.


Thanks & Regards,

Varsha


0 포인트
응답