- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to run application from host machine on coprocessor but when i execute the command
mpirun -n 2 -host host-name /tmp/test.mic
it hangs on command line and does not show any output.
However when i run directly on coprocessor/host , it works fine. What could be the issue?
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the follow up.
The message ' "/bin/pmi_proxy: line 2: syntax error: unexpected word (expecting ")")" ' might indicate that copy of in the mic card is from the "intel64" directory and not from the "mic" binary directory.
It might be worth to try again these copies:
% scp /opt/intel/impi/4.1.3.045/mic/bin/pmi_proxy mic0:/bin
% scp /opt/intel/impi/4.1.3.045/mic/lib/* mic0:/lib64/.
and then re-run your test.
Best,
Leo.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is Intel MPI visible on the coprocessor?
That is, is a directory such as /opt/intel/impi/4.1.3.045 mounted on the coprocessor?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot guys. Now I can run app from host on coprocessor.
However there is one problem, when i try to run on host and coprocessor in one command, i get an error message
:
mpirun -n 3 -host gauss ./test.host : -iface mic0 -host mic0 -n 2 /tmp/test.mic
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is good progress: great!
I believe bad termination might also be related with the application you’re attempting to run.Have you already tried to run another MPI example and observe the behavior?
For example:
% cp /opt/intel/impi /4.1.3.045/test/test.c .
% mpiicc -mmic test.c -o test_hello.mic
% mpiicc test.c -o test_hello
% scp test_hello.mic mic0:/tmp
% mpirun -n 2 -host localhost ./test_hello : -n 2 -iface mic0 -host mic0 /tmp/test_hello.mic
Hello world: rank 0 of 4 running on Some-Host-Name
Hello world: rank 1 of 4 running on Some-Host-Name
Hello world: rank 2 of 4 running on Some-Host-Name -mic0
Hello world: rank 3 of 4 running on Some-Host-Name -mic0
Best,
Leo.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried with other application as well, but no success.
Here is the sample program I am trying to run
"
#include <stdio.h> #include <mpi.h> int main (argc, argv) int argc; char *argv[]; { int rank, size; MPI_Init (&argc, &argv); /* starts MPI */ MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */ MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */ printf( "Hello world from process %d of %d\n", rank, size ); MPI_Finalize(); return 0; }"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Check your MPI setup. That looks like an MPICH2 error message.
Similar topic: http://software.intel.com/en-us/forums/topic/405183
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But when i run it individually, it works fine. Problem persists only if run simultaneuosly on both machines.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, that could happen. Try some commands like "which mpirun" and "which mpiexec" to check whether perhaps you're picking up something from some other MPI.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
by running which mpirun i got an output
"/opt/intel/impi/4.1.3.045/intel64/bin/mpirun"
so when I ran mpirun from mic/bin directory and still I got the same error message.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you're convinced this message is from Intel MPI (which I'm not), then the message is telling you there's an error in your test program.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
It appears that I am having a similar problem. I followed the thread down to #22. It helped me to improve my /etc/host settings. These are the outputs of some of the commands commenters were asking for:
% mpirun -host mic0 hostname
uhams02a.phys.hawaii.edu
% mpirun -host 192.131.1.1 hostname
uhams02a.phys.hawaii.edu
% mpirun -host gauss-mic0 hostname
uhams02a.phys.hawaii.edu
% hostname
uhams02a.phys.hawaii.edu
% cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.31.1.1 uhams02a-mic0.phys.hawaii.edu mic0 #Generated-by-micctrl
172.31.1.254 host uhams02a.phys.hawaii.edu #pvd
% ssh mic0 hostname
uhams02a-mic0.phys.hawaii.edu
% ssh mic0 cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
::1 localhost.localdomain localhost
172.31.1.254 host uhams02a.phys.hawaii.edu
172.31.1.1 uhams02a-mic0.phys.hawaii.edu mic0
I copied:
% scp /opt/intel/impi/4.1.3.045/mic/bin/pmi_proxy mic0:/bin
% scp /opt/intel/impi/4.1.3.045/mic/lib/* mic0:/lib64/
After that I also restarted the card with:
% sudo service mpss stop
% sudo service mpss start
% export I_MPI_MIC=enabexport I_MPI_MIC=enable; mpirun -v -host mic0 -n 1 hostnamele; mpirun -v -host mic0 -n 1 hostname |& grep "Launch arguments"
"Launch arguments: /usr/local/bin/ssh -x -q mic0 sh -c 'export I_MPI_ROOT="/opt/intel/impi/5.0.0.028" ; export PATH="/opt/intel/impi/5.0.0.028/intel64/bin//../../mic/bin:${I_MPI_ROOT}:${I_MPI_ROOT}/mic/bin:${PATH}" ; exec "$0" "$@"' pmi_proxy --control-port 172.31.1.254:48386 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --enable-mic --i_mpi_base_path /opt/intel/impi/5.0.0.028/intel64/bin/ --i_mpi_base_arch 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1728469499 --usize -2 --proxy-id 0"
This continues to hang. Any idea of how to proceed is appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was stuck with the same problem of mpirun hanging. The following procedures solved my problem.
- setup passwordless SSH to the coprocessor
- check and setup firewall or temporarily turn it off, e.g. for CentOS
$ sudo systemctl stop firewalld
- copy pmi_proxy to the coprocessor /bin directory e.g.
$ scp /opt/intel/impi/2017.4.239/mic/bin/pmi_proxy mic0:/bin
or copy to /var/mpss/mic0/bin/ at host and reboot coprocessor
$ sudo micctrl --reboot mic0

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »