Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 Discussions

MPI application fails to run from host machine on coprocessor

roshan_c_
Beginner
6,743 Views

I am trying to run application from host machine on coprocessor but when i execute the command

mpirun -n 2 -host host-name /tmp/test.mic

it hangs on command line and does not show any output.

However when i run directly on coprocessor/host , it works fine. What could be the issue?

0 Kudos
32 Replies
Leonardo_B_Intel
Employee
1,562 Views

Thank you for the follow up.

The message '  "/bin/pmi_proxy: line 2: syntax error: unexpected word (expecting ")")"   '   might indicate that copy of  in the mic card is from the "intel64" directory and not from the "mic" binary directory.

It might be worth to try again these copies:

% scp /opt/intel/impi/4.1.3.045/mic/bin/pmi_proxy mic0:/bin

% scp /opt/intel/impi/4.1.3.045/mic/lib/* mic0:/lib64/.

 

and then re-run your test.

Best,

Leo.

 

 

 

0 Kudos
Gregg_S_Intel
Employee
1,562 Views

Is Intel MPI visible on the coprocessor?

That is, is a directory such as /opt/intel/impi/4.1.3.045 mounted on the coprocessor?

 

0 Kudos
roshan_c_
Beginner
1,562 Views

Thanks a lot guys. Now I can run app from host on coprocessor. 

However there is one problem, when i try to run on host and coprocessor in one command, i get an error message

:

 mpirun  -n 3 -host gauss ./test.host : -iface mic0 -host mic0 -n 2 /tmp/test.mic 

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)"

0 Kudos
Leonardo_B_Intel
Employee
1,562 Views

This is good progress: great!

I believe bad termination might also be related with the application you’re attempting to run.Have you already tried to run another MPI example and observe the behavior?
For example:

% cp /opt/intel/impi /4.1.3.045/test/test.c .
% mpiicc -mmic test.c -o test_hello.mic
% mpiicc test.c -o test_hello
% scp test_hello.mic mic0:/tmp
% mpirun -n 2 -host localhost ./test_hello : -n 2 -iface mic0 -host mic0 /tmp/test_hello.mic
Hello world: rank 0 of 4 running on Some-Host-Name
Hello world: rank 1 of 4 running on Some-Host-Name
Hello world: rank 2 of 4 running on Some-Host-Name -mic0
Hello world: rank 3 of 4 running on Some-Host-Name -mic0


Best,
Leo.

0 Kudos
roshan_c_
Beginner
1,562 Views

I tried with other application as well, but no success.

Here is the sample program I am trying to run

"

#include <stdio.h>
#include <mpi.h>


int main (argc, argv)
     int argc;
     char *argv[];
{
  int rank, size;

  MPI_Init (&argc, &argv);	/* starts MPI */
  MPI_Comm_rank (MPI_COMM_WORLD, &rank);	/* get current process id */
  MPI_Comm_size (MPI_COMM_WORLD, &size);	/* get number of processes */
  printf( "Hello world from process %d of %d\n", rank, size );
  MPI_Finalize();
  return 0;
}"

 

0 Kudos
Gregg_S_Intel
Employee
1,562 Views

Check your MPI setup.  That looks like an MPICH2 error message.

Similar topic:  http://software.intel.com/en-us/forums/topic/405183

0 Kudos
roshan_c_
Beginner
1,562 Views

But when i run it individually, it works fine. Problem persists only if run simultaneuosly on both machines.

0 Kudos
Gregg_S_Intel
Employee
1,562 Views

Yes, that could happen.  Try some commands like "which mpirun" and "which mpiexec" to check whether perhaps you're picking up something from some other MPI.

0 Kudos
roshan_c_
Beginner
1,562 Views

by running which mpirun i got an  output

"/opt/intel/impi/4.1.3.045/intel64/bin/mpirun"

so when I ran mpirun from mic/bin directory and still I got the same error message.

0 Kudos
Gregg_S_Intel
Employee
1,562 Views

If you're convinced this message is from Intel MPI (which I'm not), then the message is telling you there's an error in your test program.

0 Kudos
Philip_v_
Beginner
1,562 Views

Hi,
It appears that I am having a similar problem. I followed the thread down to #22. It helped me to improve my /etc/host settings. These are the outputs of some of the commands commenters were asking for:

% mpirun -host mic0 hostname
uhams02a.phys.hawaii.edu

% mpirun -host 192.131.1.1 hostname
uhams02a.phys.hawaii.edu

% mpirun -host gauss-mic0 hostname
uhams02a.phys.hawaii.edu

% hostname
uhams02a.phys.hawaii.edu

% cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.31.1.1    uhams02a-mic0.phys.hawaii.edu mic0 #Generated-by-micctrl
172.31.1.254    host uhams02a.phys.hawaii.edu #pvd

% ssh mic0 hostname
uhams02a-mic0.phys.hawaii.edu

% ssh mic0 cat /etc/hosts
127.0.0.1    localhost.localdomain localhost
::1        localhost.localdomain localhost

172.31.1.254    host uhams02a.phys.hawaii.edu
172.31.1.1    uhams02a-mic0.phys.hawaii.edu mic0

I copied:
% scp /opt/intel/impi/4.1.3.045/mic/bin/pmi_proxy mic0:/bin
% scp /opt/intel/impi/4.1.3.045/mic/lib/* mic0:/lib64/

After that I also restarted the card with:
% sudo service mpss stop
% sudo service mpss start

% export I_MPI_MIC=enabexport I_MPI_MIC=enable; mpirun -v -host mic0 -n 1 hostnamele; mpirun -v -host mic0 -n 1 hostname |&  grep "Launch arguments"
"Launch arguments: /usr/local/bin/ssh -x -q mic0 sh -c 'export I_MPI_ROOT="/opt/intel/impi/5.0.0.028" ; export PATH="/opt/intel/impi/5.0.0.028/intel64/bin//../../mic/bin:${I_MPI_ROOT}:${I_MPI_ROOT}/mic/bin:${PATH}" ; exec "$0" "$@"' pmi_proxy --control-port 172.31.1.254:48386 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --enable-mic --i_mpi_base_path /opt/intel/impi/5.0.0.028/intel64/bin/ --i_mpi_base_arch 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1728469499 --usize -2 --proxy-id 0"

This continues to hang. Any idea of how to proceed is appreciated.


 

0 Kudos
Tomas
Beginner
1,562 Views

I was stuck with the same problem of mpirun hanging. The following procedures solved my problem.

- setup passwordless SSH to the coprocessor

- check and setup firewall or temporarily turn it off, e.g. for CentOS
$ sudo systemctl stop firewalld

- copy pmi_proxy to the coprocessor /bin directory e.g.
$ scp /opt/intel/impi/2017.4.239/mic/bin/pmi_proxy mic0:/bin

or copy to /var/mpss/mic0/bin/ at host and reboot coprocessor
$ sudo micctrl --reboot mic0

0 Kudos
Reply