Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 Discussions

MPI application fails to run from host machine on coprocessor

roshan_c_
Beginner
6,731 Views

I am trying to run application from host machine on coprocessor but when i execute the command

mpirun -n 2 -host host-name /tmp/test.mic

it hangs on command line and does not show any output.

However when i run directly on coprocessor/host , it works fine. What could be the issue?

0 Kudos
32 Replies
Leonardo_B_Intel
Employee
5,196 Views

Hello,
Are there any messages printed before the hang?
Can you please confirm that the environment variable I_MPI_MIC=1 is set before issuing mpirun?
Waht is the output of

$ mpirun -V


Could you try

$ mpirun -n 2 -host mic0 -env I_MPI_DEBUG=3 <your_mic_binary>


Could you try

$ mpirun -n 2 -host localhost -env I_MPI_DEBUG=3 <your_host_binary>


Thanks,
Leo.

 

0 Kudos
roshan_c_
Beginner
5,195 Views

Thanks for your reply.

mpirun -V

Intel(R) MPI Library for Linux* OS, Version 4.1 Update 3 Build 20131205
Copyright (C) 2003-2013, Intel Corporation. All rights reserved.

mpirun -n 2 -host localhost -env I_MPI_DEBUG=3 <your_host_binary>

this works fine for localhost. but

mpirun -n 2 -host mic0 -env I_MPI_DEBUG=3 <your_host_binary>

after entering this command, it seems that it waits / hangs.

0 Kudos
Loc_N_Intel
Employee
5,195 Views

Hi Roshan,

I think Leo suggests you to run

% mpirun -n 2 -host mic0 -env I_MPI_DEBUG=3 <your_mic_binary>

but not

% mpirun -n 2 -host mic0 -env I_MPI_DEBUG=3 <your_host_binary>

You may need to compile your mic binary with the command

% mpiicc -mmic <source code> -o <your_mic_binary>

For example:

% mpiicc -mmic test.c -o test.mic

Besides, you need to transfer the MIC binary to your coprocessor (or NFS mount):

% scp test.mic mic0:/tmp/.

And also pmi_proxy and all MPI libraries

% scp /opt/intel/impi/<version>/mic/bin/pmi_proxy mic0:/bin

% scp /opt/intel/impi/<version>/mic/lib/* mic0:/lib64/.

After enabling the env variable I_MPI_MIC

% export I_MPI_MIC=1

Now you should be able to run it:

% mpirun -n 2 -host mic0 -env=I_MPI_DEBUG=3 /tmp/test.mic

 

 

0 Kudos
Leonardo_B_Intel
Employee
5,195 Views

Perfect. I definitively recommend following Loc’s guidelines step-by-step as described above.

If you still see the silent hang issue after trying these, I’d suggest a step back and making sure that the environment is actually prepared to run MPI:


1. Would you confirm that it is possible to execute ‘hostname’ on mic0 via ssh?  (a fail here would be equivalent to "scp" failing in the above guidelines)

$ ssh mic0 hostname


2.If OK, then can you please try to use mpirun to execute only ‘hostname’ on mic0?  (that is, without any user-compiled binary)

$ setenv I_MPI_MIC 1
$ mpirun -n 2 -host mic0 hostname

Thank you,
Leo.

0 Kudos
roshan_c_
Beginner
5,195 Views

Hi,

I followed the instructions given by Loc by no success.

when i

"ssh mic0 hostname"

I can see the hostname. Also scp for copying binary works.

When i 

" mpirun -n 2 -host mic0 hostname"

it hangs or does not show any o/p.

Did i miss to set any variables here? I doubt because I can run the application directly on mic0 

0 Kudos
Loc_N_Intel
Employee
5,195 Views

Let's me take a look at your /etc/hosts files on host and coprocessor. Would you please display the output from the following commands: 

[cpp]

% hostname

% cat /etc/hosts

% ssh mic0 hostname

% ssh mic0 cat /etc/hosts

[/cpp]

 

0 Kudos
roshan_c_
Beginner
5,195 Views

hostname

gauss

cat /etc/hosts
127.0.0.1       localhost
127.0.1.1       gauss

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.31.1.1      gauss-mic0 mic0
172.31.1.254    hostmic0

 ssh mic0 hostname
gauss-mic0

 

ssh mic0 cat /etc/hosts
127.0.0.1       gauss-mic0 mic0 localhost.localdomain localhost
::1             gauss-mic0 mic0 localhost.localdomain localhost

172.31.1.254    host

0 Kudos
Loc_N_Intel
Employee
5,196 Views

Hi Roshan,

Look at your /etc/hosts on your host system, there are two additional lines that makes me wonder

127.0.1.1  gaus

and

172.31.1.254 hostmic0

I am not sure how you have these lines in your /etc/hosts

And in the /etc/hosts in the coprocessor, it looks like it misses one line

172.31.1.1   gauss-mic0 mic0 

My suggestion is to remove the above two lines in /etc/hosts in your host system (save it first for backup) . Also, try the following commands in your host system and see if there is any output:

[cpp]

mpirun -host mic0 hostname

mpirun -host 192.131.1.1 hostname

mpirun -host gauss-mic0 hostname

[/cpp]

0 Kudos
roshan_c_
Beginner
5,196 Views

For all three commands, i get "gauss" output. I removed 2 lines from hostmachine /etec/hosts file and added "172.31.1.1   gauss-mic0 mic0 " on co-processor.

After doing this I still not able to run it.

0 Kudos
Gregg_S_Intel
Employee
5,196 Views

If you run

export I_MPI_MIC=enable; mpirun -host mic0 -n 1 hostname

 

It should respond with

gauss-mic0

 

0 Kudos
roshan_c_
Beginner
5,197 Views

mpirun -host mic0 -n 1 hostname

this doesn't give any output or it seems that it hangs which is same behaviour as my problem.

0 Kudos
Leonardo_B_Intel
Employee
5,197 Views

It might be worth to verify if your mpirun command is at least issuing the ssh command for the connection. Would you please add the "-v" verbose option to the mpirun command as shown below  and post the output here?

% export I_MPI_MIC=enable; mpirun -v -host mic0 -n 1 hostname

 

Also: can you confirm you can ssh without a password to mic0 ?

 

Thank you,

Leo.

 

 

 

 

0 Kudos
roshan_c_
Beginner
5,196 Views

I am attaching a file containing output of above command

it does not terminate by itself. i had to kill it by ctrl+z.

and ssh does not work without password.

 

 

0 Kudos
Gregg_S_Intel
Employee
5,196 Views

Need to set up passwordless SSH to the coprocessor.

Passwordless SSH is a prerequisite for MPI.

 

0 Kudos
roshan_c_
Beginner
5,196 Views

Now i have setup passwordless ssh. But when i run mpirun command i still get an error messgae:

"[proxy:0:0@gauss-mic0] HYDU_sock_connect (./utils/sock/sock.c:264): unable to connect from "gauss-mic0" to "127.0.1.1" (Connection refused)
[proxy:0:0@gauss-mic0] main (./pm/pmiserv/pmip.c:396): unable to connect to server 127.0.1.1 at port 42947 (check for firewalls!)
^CCtrl-C caught... cleaning up processes
[mpiexec@gauss] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed
[mpiexec@gauss] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream
[mpiexec@gauss] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@gauss] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec@gauss] main (./ui/mpich/mpiexec.c:900): process manager error waiting for completion"

 

0 Kudos
Loc_N_Intel
Employee
5,196 Views

roshan c. wrote:

For all three commands, i get "gauss" output. I removed 2 lines from hostmachine /etec/hosts file and added "172.31.1.1   gauss-mic0 mic0 " on co-processor.

After doing this I still not able to run it.

172.0.1.1 is  the IP address of gauss according to your original /etc/hosts on host.

I am guessing that maybe removing these two lines causes this problem. Can you try to put them back in in the host /etc/hosts and try again?

0 Kudos
Gregg_S_Intel
Employee
5,196 Views

Follow the advice in the output, "check for firewalls!"

It's likely a firewall is preventing the connection from the coprocessor to the host.

See also,

http://software.intel.com/en-us/articles/firewalls-and-mpi

http://software.intel.com/en-us/articles/using-intel-mpi-library-and-intel-xeon-phi-coprocessor-tips

 

0 Kudos
roshan_c_
Beginner
5,196 Views

Now, i am getting different error message:

when i run " mpirun -n 2 -host mic0 /tmp/test.mic" 

sh: /opt/intel/impi/4.1.3.045/intel64/bin/pmi_proxy: not found

This binary is present on both the machines. 

0 Kudos
Leonardo_B_Intel
Employee
5,196 Views

Hello,

Can you please confirm that I_MPI_MIC is set to either "1" or "enabled"?

Woul you send the output of:

% export I_MPI_MIC=enable; mpirun -v -host mic0 -n 1 hostname |&  grep "Launch arguments"

 

thanks,

Leo.

 

0 Kudos
roshan_c_
Beginner
4,541 Views

it gives output

"[mpiexec@gauss] Launch arguments: /usr/bin/ssh -x -q mic0 sh -c 'export I_MPI_ROOT="/opt/intel/impi/4.1.3.045" ; export PATH="/opt/intel/impi/4.1.3.045/intel64/bin//../../mic/bin:${I_MPI_ROOT}:${I_MPI_ROOT}/mic/bin:${PATH}" ; exec "$0" "$@"' pmi_proxy --control-port 127.0.1.1:52573 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --enable-mic --i_mpi_base_path /opt/intel/impi/4.1.3.045/intel64/bin/ --i_mpi_base_arch 0 --rmk slurm --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 107981940 --proxy-id 0 "

 

when i enabled above variable and run mpirun command i got an output

"/bin/pmi_proxy: line 2: syntax error: unexpected word (expecting ")")"

0 Kudos
Reply