Software Archive
Read-only legacy content
17061 Discussions

Can't start MPI job from the host system

Alastair_M_
New Contributor I
639 Views

Dear all,

I am trying to configure a new server with 4 MICs in it and I am having a problem running MPI jobs.

I can run a job correctly from the Xeon Phi after copying the mpirun binary over to the mic.

mpirun -hosts mic0,mic1,mic2,mic3 -n 4 -ppn 1 ./a.out

This works, but when I try to start the job from the host with "export I_MPI_MIC=enable"

I get the following error message:

[proxy:0:0@appliance2-mic0] HYDU_sock_connect (../../utils/sock/sock.c:268): unable to connect from "appliance2-mic0" to "127.0.0.1" (Connection refused)
[proxy:0:0@appliance2-mic0] main (../../pm/pmiserv/pmip.c:372): unable to connect to server 127.0.0.1 at port 45392 (check for firewalls!)

I have my host firewall disabled so I am not really sure how to diagnose this.

Best regards,

Alastair

 

0 Kudos
3 Replies
Sunny_G_Intel
Employee
639 Views

Do you have /opt/intel available via NFS on the coprocessor?  If not, you will need to ensure that pmi_proxy (along with whichever MPI libraries you have linked) is available in the path on the coprocessor.

0 Kudos
Loc_N_Intel
Employee
639 Views

Firewall likely blocks the communication between the host and coprocessors, would you like to double check if the firewall is disabled totally? There is a thread in the forum reporting the same symptom ( https://software.intel.com/en-us/forums/topic/392468 ) and the user just disables the firewall to resolve the issue.

For more information, please refer to https://software.intel.com/en-us/articles/firewalls-and-mpi

0 Kudos
Sunny_G_Intel
Employee
639 Views

Hello Alastair,

I would like to follow up on the issue about starting MPI job on host system. I hope you were able to resolve the issue. 

Regards,

Sunny

0 Kudos
Reply