Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 Discussions

Can't start MPI job from the host system

Alastair_M_
New Contributor I
1,210 Views

Dear all,

I am trying to configure a new server with 4 MICs in it and I am having a problem running MPI jobs.

I can run a job correctly from the Xeon Phi after copying the mpirun binary over to the mic.

mpirun -hosts mic0,mic1,mic2,mic3 -n 4 -ppn 1 ./a.out

This works, but when I try to start the job from the host with "export I_MPI_MIC=enable"

I get the following error message:

[proxy:0:0@appliance2-mic0] HYDU_sock_connect (../../utils/sock/sock.c:268): unable to connect from "appliance2-mic0" to "127.0.0.1" (Connection refused)
[proxy:0:0@appliance2-mic0] main (../../pm/pmiserv/pmip.c:372): unable to connect to server 127.0.0.1 at port 45392 (check for firewalls!)

I have my host firewall disabled so I am not really sure how to diagnose this.

Best regards,

Alastair

 

0 Kudos
3 Replies
Sunny_G_Intel
Employee
1,210 Views

Do you have /opt/intel available via NFS on the coprocessor?  If not, you will need to ensure that pmi_proxy (along with whichever MPI libraries you have linked) is available in the path on the coprocessor.

0 Kudos
Loc_N_Intel
Employee
1,210 Views

Firewall likely blocks the communication between the host and coprocessors, would you like to double check if the firewall is disabled totally? There is a thread in the forum reporting the same symptom ( https://software.intel.com/en-us/forums/topic/392468 ) and the user just disables the firewall to resolve the issue.

For more information, please refer to https://software.intel.com/en-us/articles/firewalls-and-mpi

0 Kudos
Sunny_G_Intel
Employee
1,210 Views

Hello Alastair,

I would like to follow up on the issue about starting MPI job on host system. I hope you were able to resolve the issue. 

Regards,

Sunny

0 Kudos
Reply