- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to run application from host machine on coprocessor but when i execute the command
mpirun -n 2 -host host-name /tmp/test.mic
it hangs on command line and does not show any output.
However when i run directly on coprocessor/host , it works fine. What could be the issue?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Are there any messages printed before the hang?
Can you please confirm that the environment variable I_MPI_MIC=1 is set before issuing mpirun?
Waht is the output of
$ mpirun -V
Could you try
$ mpirun -n 2 -host mic0 -env I_MPI_DEBUG=3 <your_mic_binary>
Could you try
$ mpirun -n 2 -host localhost -env I_MPI_DEBUG=3 <your_host_binary>
Thanks,
Leo.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply.
mpirun -V
Intel(R) MPI Library for Linux* OS, Version 4.1 Update 3 Build 20131205
Copyright (C) 2003-2013, Intel Corporation. All rights reserved.
mpirun -n 2 -host localhost -env I_MPI_DEBUG=3 <your_host_binary>
this works fine for localhost. but
mpirun -n 2 -host mic0 -env I_MPI_DEBUG=3 <your_host_binary>
after entering this command, it seems that it waits / hangs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Roshan,
I think Leo suggests you to run
% mpirun -n 2 -host mic0 -env I_MPI_DEBUG=3 <your_mic_binary>
but not
% mpirun -n 2 -host mic0 -env I_MPI_DEBUG=3 <your_host_binary>
You may need to compile your mic binary with the command
% mpiicc -mmic <source code> -o <your_mic_binary>
For example:
% mpiicc -mmic test.c -o test.mic
Besides, you need to transfer the MIC binary to your coprocessor (or NFS mount):
% scp test.mic mic0:/tmp/.
And also pmi_proxy and all MPI libraries
% scp /opt/intel/impi/<version>/mic/bin/pmi_proxy mic0:/bin
% scp /opt/intel/impi/<version>/mic/lib/* mic0:/lib64/.
After enabling the env variable I_MPI_MIC
% export I_MPI_MIC=1
Now you should be able to run it:
% mpirun -n 2 -host mic0 -env=I_MPI_DEBUG=3 /tmp/test.mic
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Perfect. I definitively recommend following Loc’s guidelines step-by-step as described above.
If you still see the silent hang issue after trying these, I’d suggest a step back and making sure that the environment is actually prepared to run MPI:
1. Would you confirm that it is possible to execute ‘hostname’ on mic0 via ssh? (a fail here would be equivalent to "scp" failing in the above guidelines)
$ ssh mic0 hostname
2.If OK, then can you please try to use mpirun to execute only ‘hostname’ on mic0? (that is, without any user-compiled binary)
$ setenv I_MPI_MIC 1
$ mpirun -n 2 -host mic0 hostname
Thank you,
Leo.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I followed the instructions given by Loc by no success.
when i
"ssh mic0 hostname"
I can see the hostname. Also scp for copying binary works.
When i
" mpirun -n 2 -host mic0 hostname"
it hangs or does not show any o/p.
Did i miss to set any variables here? I doubt because I can run the application directly on mic0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let's me take a look at your /etc/hosts files on host and coprocessor. Would you please display the output from the following commands:
[cpp]
% hostname
% cat /etc/hosts
% ssh mic0 hostname
% ssh mic0 cat /etc/hosts
[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hostname
gauss
cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 gauss
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.31.1.1 gauss-mic0 mic0
172.31.1.254 hostmic0
ssh mic0 hostname
gauss-mic0
ssh mic0 cat /etc/hosts
127.0.0.1 gauss-mic0 mic0 localhost.localdomain localhost
::1 gauss-mic0 mic0 localhost.localdomain localhost
172.31.1.254 host
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Roshan,
Look at your /etc/hosts on your host system, there are two additional lines that makes me wonder
127.0.1.1 gaus
and
172.31.1.254 hostmic0
I am not sure how you have these lines in your /etc/hosts
And in the /etc/hosts in the coprocessor, it looks like it misses one line
172.31.1.1 gauss-mic0 mic0
My suggestion is to remove the above two lines in /etc/hosts in your host system (save it first for backup) . Also, try the following commands in your host system and see if there is any output:
[cpp]
mpirun -host mic0 hostname
mpirun -host 192.131.1.1 hostname
mpirun -host gauss-mic0 hostname
[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For all three commands, i get "gauss" output. I removed 2 lines from hostmachine /etec/hosts file and added "172.31.1.1 gauss-mic0 mic0 " on co-processor.
After doing this I still not able to run it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you run
export I_MPI_MIC=enable; mpirun -host mic0 -n 1 hostname
It should respond with
gauss-mic0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mpirun -host mic0 -n 1 hostname
this doesn't give any output or it seems that it hangs which is same behaviour as my problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It might be worth to verify if your mpirun command is at least issuing the ssh command for the connection. Would you please add the "-v" verbose option to the mpirun command as shown below and post the output here?
% export I_MPI_MIC=enable; mpirun -v -host mic0 -n 1 hostname
Also: can you confirm you can ssh without a password to mic0 ?
Thank you,
Leo.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Need to set up passwordless SSH to the coprocessor.
Passwordless SSH is a prerequisite for MPI.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Now i have setup passwordless ssh. But when i run mpirun command i still get an error messgae:
"[proxy:0:0@gauss-mic0] HYDU_sock_connect (./utils/sock/sock.c:264): unable to connect from "gauss-mic0" to "127.0.1.1" (Connection refused)
[proxy:0:0@gauss-mic0] main (./pm/pmiserv/pmip.c:396): unable to connect to server 127.0.1.1 at port 42947 (check for firewalls!)
^CCtrl-C caught... cleaning up processes
[mpiexec@gauss] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed
[mpiexec@gauss] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream
[mpiexec@gauss] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@gauss] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec@gauss] main (./ui/mpich/mpiexec.c:900): process manager error waiting for completion"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
roshan c. wrote:
For all three commands, i get "gauss" output. I removed 2 lines from hostmachine /etec/hosts file and added "172.31.1.1 gauss-mic0 mic0 " on co-processor.
After doing this I still not able to run it.
172.0.1.1 is the IP address of gauss according to your original /etc/hosts on host.
I am guessing that maybe removing these two lines causes this problem. Can you try to put them back in in the host /etc/hosts and try again?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Follow the advice in the output, "check for firewalls!"
It's likely a firewall is preventing the connection from the coprocessor to the host.
See also,
http://software.intel.com/en-us/articles/firewalls-and-mpi
http://software.intel.com/en-us/articles/using-intel-mpi-library-and-intel-xeon-phi-coprocessor-tips
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Now, i am getting different error message:
when i run " mpirun -n 2 -host mic0 /tmp/test.mic"
sh: /opt/intel/impi/4.1.3.045/intel64/bin/pmi_proxy: not found
This binary is present on both the machines.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Can you please confirm that I_MPI_MIC is set to either "1" or "enabled"?
Woul you send the output of:
% export I_MPI_MIC=enable; mpirun -v -host mic0 -n 1 hostname |& grep "Launch arguments"
thanks,
Leo.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it gives output
"[mpiexec@gauss] Launch arguments: /usr/bin/ssh -x -q mic0 sh -c 'export I_MPI_ROOT="/opt/intel/impi/4.1.3.045" ; export PATH="/opt/intel/impi/4.1.3.045/intel64/bin//../../mic/bin:${I_MPI_ROOT}:${I_MPI_ROOT}/mic/bin:${PATH}" ; exec "$0" "$@"' pmi_proxy --control-port 127.0.1.1:52573 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --enable-mic --i_mpi_base_path /opt/intel/impi/4.1.3.045/intel64/bin/ --i_mpi_base_arch 0 --rmk slurm --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 107981940 --proxy-id 0 "
when i enabled above variable and run mpirun command i got an output
"/bin/pmi_proxy: line 2: syntax error: unexpected word (expecting ")")"

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page