Software Archive
Read-only legacy content
17060 Discussions

MPI application fails to run from host machine on coprocessor

roshan_c_
Beginner
6,714 Views

I am trying to run application from host machine on coprocessor but when i execute the command

mpirun -n 2 -host host-name /tmp/test.mic

it hangs on command line and does not show any output.

However when i run directly on coprocessor/host , it works fine. What could be the issue?

0 Kudos
32 Replies
Leonardo_B_Intel
Employee
1,554 Views

Thank you for the follow up.

The message '  "/bin/pmi_proxy: line 2: syntax error: unexpected word (expecting ")")"   '   might indicate that copy of  in the mic card is from the "intel64" directory and not from the "mic" binary directory.

It might be worth to try again these copies:

% scp /opt/intel/impi/4.1.3.045/mic/bin/pmi_proxy mic0:/bin

% scp /opt/intel/impi/4.1.3.045/mic/lib/* mic0:/lib64/.

 

and then re-run your test.

Best,

Leo.

 

 

 

0 Kudos
Gregg_S_Intel
Employee
1,554 Views

Is Intel MPI visible on the coprocessor?

That is, is a directory such as /opt/intel/impi/4.1.3.045 mounted on the coprocessor?

 

0 Kudos
roshan_c_
Beginner
1,554 Views

Thanks a lot guys. Now I can run app from host on coprocessor. 

However there is one problem, when i try to run on host and coprocessor in one command, i get an error message

:

 mpirun  -n 3 -host gauss ./test.host : -iface mic0 -host mic0 -n 2 /tmp/test.mic 

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)"

0 Kudos
Leonardo_B_Intel
Employee
1,554 Views

This is good progress: great!

I believe bad termination might also be related with the application you’re attempting to run.Have you already tried to run another MPI example and observe the behavior?
For example:

% cp /opt/intel/impi /4.1.3.045/test/test.c .
% mpiicc -mmic test.c -o test_hello.mic
% mpiicc test.c -o test_hello
% scp test_hello.mic mic0:/tmp
% mpirun -n 2 -host localhost ./test_hello : -n 2 -iface mic0 -host mic0 /tmp/test_hello.mic
Hello world: rank 0 of 4 running on Some-Host-Name
Hello world: rank 1 of 4 running on Some-Host-Name
Hello world: rank 2 of 4 running on Some-Host-Name -mic0
Hello world: rank 3 of 4 running on Some-Host-Name -mic0


Best,
Leo.

0 Kudos
roshan_c_
Beginner
1,554 Views

I tried with other application as well, but no success.

Here is the sample program I am trying to run

"

#include <stdio.h>
#include <mpi.h>


int main (argc, argv)
     int argc;
     char *argv[];
{
  int rank, size;

  MPI_Init (&argc, &argv);	/* starts MPI */
  MPI_Comm_rank (MPI_COMM_WORLD, &rank);	/* get current process id */
  MPI_Comm_size (MPI_COMM_WORLD, &size);	/* get number of processes */
  printf( "Hello world from process %d of %d\n", rank, size );
  MPI_Finalize();
  return 0;
}"

 

0 Kudos
Gregg_S_Intel
Employee
1,554 Views

Check your MPI setup.  That looks like an MPICH2 error message.

Similar topic:  http://software.intel.com/en-us/forums/topic/405183

0 Kudos
roshan_c_
Beginner
1,554 Views

But when i run it individually, it works fine. Problem persists only if run simultaneuosly on both machines.

0 Kudos
Gregg_S_Intel
Employee
1,554 Views

Yes, that could happen.  Try some commands like "which mpirun" and "which mpiexec" to check whether perhaps you're picking up something from some other MPI.

0 Kudos
roshan_c_
Beginner
1,554 Views

by running which mpirun i got an  output

"/opt/intel/impi/4.1.3.045/intel64/bin/mpirun"

so when I ran mpirun from mic/bin directory and still I got the same error message.

0 Kudos
Gregg_S_Intel
Employee
1,554 Views

If you're convinced this message is from Intel MPI (which I'm not), then the message is telling you there's an error in your test program.

0 Kudos
Philip_v_
Beginner
1,554 Views

Hi,
It appears that I am having a similar problem. I followed the thread down to #22. It helped me to improve my /etc/host settings. These are the outputs of some of the commands commenters were asking for:

% mpirun -host mic0 hostname
uhams02a.phys.hawaii.edu

% mpirun -host 192.131.1.1 hostname
uhams02a.phys.hawaii.edu

% mpirun -host gauss-mic0 hostname
uhams02a.phys.hawaii.edu

% hostname
uhams02a.phys.hawaii.edu

% cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.31.1.1    uhams02a-mic0.phys.hawaii.edu mic0 #Generated-by-micctrl
172.31.1.254    host uhams02a.phys.hawaii.edu #pvd

% ssh mic0 hostname
uhams02a-mic0.phys.hawaii.edu

% ssh mic0 cat /etc/hosts
127.0.0.1    localhost.localdomain localhost
::1        localhost.localdomain localhost

172.31.1.254    host uhams02a.phys.hawaii.edu
172.31.1.1    uhams02a-mic0.phys.hawaii.edu mic0

I copied:
% scp /opt/intel/impi/4.1.3.045/mic/bin/pmi_proxy mic0:/bin
% scp /opt/intel/impi/4.1.3.045/mic/lib/* mic0:/lib64/

After that I also restarted the card with:
% sudo service mpss stop
% sudo service mpss start

% export I_MPI_MIC=enabexport I_MPI_MIC=enable; mpirun -v -host mic0 -n 1 hostnamele; mpirun -v -host mic0 -n 1 hostname |&  grep "Launch arguments"
"Launch arguments: /usr/local/bin/ssh -x -q mic0 sh -c 'export I_MPI_ROOT="/opt/intel/impi/5.0.0.028" ; export PATH="/opt/intel/impi/5.0.0.028/intel64/bin//../../mic/bin:${I_MPI_ROOT}:${I_MPI_ROOT}/mic/bin:${PATH}" ; exec "$0" "$@"' pmi_proxy --control-port 172.31.1.254:48386 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --enable-mic --i_mpi_base_path /opt/intel/impi/5.0.0.028/intel64/bin/ --i_mpi_base_arch 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1728469499 --usize -2 --proxy-id 0"

This continues to hang. Any idea of how to proceed is appreciated.


 

0 Kudos
Tomas
Beginner
1,554 Views

I was stuck with the same problem of mpirun hanging. The following procedures solved my problem.

- setup passwordless SSH to the coprocessor

- check and setup firewall or temporarily turn it off, e.g. for CentOS
$ sudo systemctl stop firewalld

- copy pmi_proxy to the coprocessor /bin directory e.g.
$ scp /opt/intel/impi/2017.4.239/mic/bin/pmi_proxy mic0:/bin

or copy to /var/mpss/mic0/bin/ at host and reboot coprocessor
$ sudo micctrl --reboot mic0

0 Kudos
Reply