I am trying to configure a new server with 4 MICs in it and I am having a problem running MPI jobs.
I can run a job correctly from the Xeon Phi after copying the mpirun binary over to the mic.
mpirun -hosts mic0,mic1,mic2,mic3 -n 4 -ppn 1 ./a.out
This works, but when I try to start the job from the host with "export I_MPI_MIC=enable"
I get the following error message:
[proxy:0:0@appliance2-mic0] HYDU_sock_connect (../../utils/sock/sock.c:268): unable to connect from "appliance2-mic0" to "127.0.0.1" (Connection refused)
[proxy:0:0@appliance2-mic0] main (../../pm/pmiserv/pmip.c:372): unable to connect to server 127.0.0.1 at port 45392 (check for firewalls!)
I have my host firewall disabled so I am not really sure how to diagnose this.
Do you have /opt/intel available via NFS on the coprocessor? If not, you will need to ensure that pmi_proxy (along with whichever MPI libraries you have linked) is available in the path on the coprocessor.
Firewall likely blocks the communication between the host and coprocessors, would you like to double check if the firewall is disabled totally? There is a thread in the forum reporting the same symptom ( https://software.intel.com/en-us/forums/topic/392468 ) and the user just disables the firewall to resolve the issue.
For more information, please refer to https://software.intel.com/en-us/articles/firewalls-and-mpi