Software Archive
Read-only legacy content
17060 Discussions

MPI application fails to run from host machine on coprocessor

roshan_c_
Beginner
6,703 Views

I am trying to run application from host machine on coprocessor but when i execute the command

mpirun -n 2 -host host-name /tmp/test.mic

it hangs on command line and does not show any output.

However when i run directly on coprocessor/host , it works fine. What could be the issue?

0 Kudos
32 Replies
Leonardo_B_Intel
Employee
5,175 Views

Hello,
Are there any messages printed before the hang?
Can you please confirm that the environment variable I_MPI_MIC=1 is set before issuing mpirun?
Waht is the output of

$ mpirun -V


Could you try

$ mpirun -n 2 -host mic0 -env I_MPI_DEBUG=3 <your_mic_binary>


Could you try

$ mpirun -n 2 -host localhost -env I_MPI_DEBUG=3 <your_host_binary>


Thanks,
Leo.

 

0 Kudos
roshan_c_
Beginner
5,174 Views

Thanks for your reply.

mpirun -V

Intel(R) MPI Library for Linux* OS, Version 4.1 Update 3 Build 20131205
Copyright (C) 2003-2013, Intel Corporation. All rights reserved.

mpirun -n 2 -host localhost -env I_MPI_DEBUG=3 <your_host_binary>

this works fine for localhost. but

mpirun -n 2 -host mic0 -env I_MPI_DEBUG=3 <your_host_binary>

after entering this command, it seems that it waits / hangs.

0 Kudos
Loc_N_Intel
Employee
5,174 Views

Hi Roshan,

I think Leo suggests you to run

% mpirun -n 2 -host mic0 -env I_MPI_DEBUG=3 <your_mic_binary>

but not

% mpirun -n 2 -host mic0 -env I_MPI_DEBUG=3 <your_host_binary>

You may need to compile your mic binary with the command

% mpiicc -mmic <source code> -o <your_mic_binary>

For example:

% mpiicc -mmic test.c -o test.mic

Besides, you need to transfer the MIC binary to your coprocessor (or NFS mount):

% scp test.mic mic0:/tmp/.

And also pmi_proxy and all MPI libraries

% scp /opt/intel/impi/<version>/mic/bin/pmi_proxy mic0:/bin

% scp /opt/intel/impi/<version>/mic/lib/* mic0:/lib64/.

After enabling the env variable I_MPI_MIC

% export I_MPI_MIC=1

Now you should be able to run it:

% mpirun -n 2 -host mic0 -env=I_MPI_DEBUG=3 /tmp/test.mic

 

 

0 Kudos
Leonardo_B_Intel
Employee
5,174 Views

Perfect. I definitively recommend following Loc’s guidelines step-by-step as described above.

If you still see the silent hang issue after trying these, I’d suggest a step back and making sure that the environment is actually prepared to run MPI:


1. Would you confirm that it is possible to execute ‘hostname’ on mic0 via ssh?  (a fail here would be equivalent to "scp" failing in the above guidelines)

$ ssh mic0 hostname


2.If OK, then can you please try to use mpirun to execute only ‘hostname’ on mic0?  (that is, without any user-compiled binary)

$ setenv I_MPI_MIC 1
$ mpirun -n 2 -host mic0 hostname

Thank you,
Leo.

0 Kudos
roshan_c_
Beginner
5,174 Views

Hi,

I followed the instructions given by Loc by no success.

when i

"ssh mic0 hostname"

I can see the hostname. Also scp for copying binary works.

When i 

" mpirun -n 2 -host mic0 hostname"

it hangs or does not show any o/p.

Did i miss to set any variables here? I doubt because I can run the application directly on mic0 

0 Kudos
Loc_N_Intel
Employee
5,174 Views

Let's me take a look at your /etc/hosts files on host and coprocessor. Would you please display the output from the following commands: 

[cpp]

% hostname

% cat /etc/hosts

% ssh mic0 hostname

% ssh mic0 cat /etc/hosts

[/cpp]

 

0 Kudos
roshan_c_
Beginner
5,174 Views

hostname

gauss

cat /etc/hosts
127.0.0.1       localhost
127.0.1.1       gauss

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.31.1.1      gauss-mic0 mic0
172.31.1.254    hostmic0

 ssh mic0 hostname
gauss-mic0

 

ssh mic0 cat /etc/hosts
127.0.0.1       gauss-mic0 mic0 localhost.localdomain localhost
::1             gauss-mic0 mic0 localhost.localdomain localhost

172.31.1.254    host

0 Kudos
Loc_N_Intel
Employee
5,175 Views

Hi Roshan,

Look at your /etc/hosts on your host system, there are two additional lines that makes me wonder

127.0.1.1  gaus

and

172.31.1.254 hostmic0

I am not sure how you have these lines in your /etc/hosts

And in the /etc/hosts in the coprocessor, it looks like it misses one line

172.31.1.1   gauss-mic0 mic0 

My suggestion is to remove the above two lines in /etc/hosts in your host system (save it first for backup) . Also, try the following commands in your host system and see if there is any output:

[cpp]

mpirun -host mic0 hostname

mpirun -host 192.131.1.1 hostname

mpirun -host gauss-mic0 hostname

[/cpp]

0 Kudos
roshan_c_
Beginner
5,175 Views

For all three commands, i get "gauss" output. I removed 2 lines from hostmachine /etec/hosts file and added "172.31.1.1   gauss-mic0 mic0 " on co-processor.

After doing this I still not able to run it.

0 Kudos
Gregg_S_Intel
Employee
5,175 Views

If you run

export I_MPI_MIC=enable; mpirun -host mic0 -n 1 hostname

 

It should respond with

gauss-mic0

 

0 Kudos
roshan_c_
Beginner
5,176 Views

mpirun -host mic0 -n 1 hostname

this doesn't give any output or it seems that it hangs which is same behaviour as my problem.

0 Kudos
Leonardo_B_Intel
Employee
5,176 Views

It might be worth to verify if your mpirun command is at least issuing the ssh command for the connection. Would you please add the "-v" verbose option to the mpirun command as shown below  and post the output here?

% export I_MPI_MIC=enable; mpirun -v -host mic0 -n 1 hostname

 

Also: can you confirm you can ssh without a password to mic0 ?

 

Thank you,

Leo.

 

 

 

 

0 Kudos
roshan_c_
Beginner
5,175 Views

I am attaching a file containing output of above command

it does not terminate by itself. i had to kill it by ctrl+z.

and ssh does not work without password.

 

 

0 Kudos
Gregg_S_Intel
Employee
5,175 Views

Need to set up passwordless SSH to the coprocessor.

Passwordless SSH is a prerequisite for MPI.

 

0 Kudos
roshan_c_
Beginner
5,175 Views

Now i have setup passwordless ssh. But when i run mpirun command i still get an error messgae:

"[proxy:0:0@gauss-mic0] HYDU_sock_connect (./utils/sock/sock.c:264): unable to connect from "gauss-mic0" to "127.0.1.1" (Connection refused)
[proxy:0:0@gauss-mic0] main (./pm/pmiserv/pmip.c:396): unable to connect to server 127.0.1.1 at port 42947 (check for firewalls!)
^CCtrl-C caught... cleaning up processes
[mpiexec@gauss] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed
[mpiexec@gauss] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream
[mpiexec@gauss] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@gauss] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec@gauss] main (./ui/mpich/mpiexec.c:900): process manager error waiting for completion"

 

0 Kudos
Loc_N_Intel
Employee
5,175 Views

roshan c. wrote:

For all three commands, i get "gauss" output. I removed 2 lines from hostmachine /etec/hosts file and added "172.31.1.1   gauss-mic0 mic0 " on co-processor.

After doing this I still not able to run it.

172.0.1.1 is  the IP address of gauss according to your original /etc/hosts on host.

I am guessing that maybe removing these two lines causes this problem. Can you try to put them back in in the host /etc/hosts and try again?

0 Kudos
Gregg_S_Intel
Employee
5,175 Views

Follow the advice in the output, "check for firewalls!"

It's likely a firewall is preventing the connection from the coprocessor to the host.

See also,

http://software.intel.com/en-us/articles/firewalls-and-mpi

http://software.intel.com/en-us/articles/using-intel-mpi-library-and-intel-xeon-phi-coprocessor-tips

 

0 Kudos
roshan_c_
Beginner
5,175 Views

Now, i am getting different error message:

when i run " mpirun -n 2 -host mic0 /tmp/test.mic" 

sh: /opt/intel/impi/4.1.3.045/intel64/bin/pmi_proxy: not found

This binary is present on both the machines. 

0 Kudos
Leonardo_B_Intel
Employee
5,175 Views

Hello,

Can you please confirm that I_MPI_MIC is set to either "1" or "enabled"?

Woul you send the output of:

% export I_MPI_MIC=enable; mpirun -v -host mic0 -n 1 hostname |&  grep "Launch arguments"

 

thanks,

Leo.

 

0 Kudos
roshan_c_
Beginner
4,520 Views

it gives output

"[mpiexec@gauss] Launch arguments: /usr/bin/ssh -x -q mic0 sh -c 'export I_MPI_ROOT="/opt/intel/impi/4.1.3.045" ; export PATH="/opt/intel/impi/4.1.3.045/intel64/bin//../../mic/bin:${I_MPI_ROOT}:${I_MPI_ROOT}/mic/bin:${PATH}" ; exec "$0" "$@"' pmi_proxy --control-port 127.0.1.1:52573 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --enable-mic --i_mpi_base_path /opt/intel/impi/4.1.3.045/intel64/bin/ --i_mpi_base_arch 0 --rmk slurm --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 107981940 --proxy-id 0 "

 

when i enabled above variable and run mpirun command i got an output

"/bin/pmi_proxy: line 2: syntax error: unexpected word (expecting ")")"

0 Kudos
Reply