- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am writing some toy code to test the MPI programming on MICs. My code worked on one Xeon Phi card, also worked between host and one Xeon Phi card, but got frozen if running on two Xeon Phi cards of the same host. The running command is like: mpirun -host mic0 -n 4 /hellompi.MIC : -host mic1 -n 4 /hellompi.MIC.
After digging in, I found that the issue is in MPI_Init(&argc, &argv) function. If i disabled this function, as well as other MPI related functions, the program could be launched in both Xeon Phi cards simultaneously.
Any one knows why? Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does this run on each card, individually? That is, do both of these complete?
mpirun -host mic0 -n 4 a.out
mpirun -host mic1 -n 4 a.out
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, running on an individual card worked well. I even tried the Mento Carlo code provided by your instruction (http://software.intel.com/en-us/articles/using-the-intel-mpi-library-on-intel-xeon-phi-coprocessor-systems#viewSource), it also only worked on indivisual cards but not two simultaneously. After waiting for a while, it would return some error message. The screen shot is as below (I used mpiexec.hydra in this case):
[prompt]# /opt/intel/impi/4.1.1/bin64/mpiexec.hydra -host mic0 -n 1 /micfs/mc.MIC : -host mic1 -n 1 /micfs/mc.MIC
rank = 1, revents = 29, state = 1
Assertion failed in file ../../socksm.c at line 2963: (it_plfd->revents & POLLERR) == 0
internal ABORT - process 0
[mpiexec@hostname] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:98): one of the processes terminated badly; aborting
[mpiexec@hostname] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@hostname] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:440): launcher returned error waiting for completion
[mpiexec@hostname] main (./ui/mpich/mpiexec.c:847): process manager error waiting for completion
Gregg Skinner (Intel) wrote:
Does this run on each card, individually? That is, do both of these complete?
mpirun -host mic0 -n 4 a.out
mpirun -host mic1 -n 4 a.out
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have the cards been configured for networking? You should be able to ssh from one card to the other.
Set up a static or DHCP bridge, as described in the Cluster Setup Guide, which is found in the Intel(R) MPSS docs directory.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gregg Skinner (Intel) wrote:
Have the cards been configured for networking? You should be able to ssh from one card to the other.
Set up a static or DHCP bridge, as described in the Cluster Setup Guide, which is found in the Intel(R) MPSS docs directory.
In particular, there is a useful sshconnectivity script in the unpacked installation directory for Intel MPI. Once the coprocessor is running, this script should be run for each user and for root, followed by sudo service mpss stop; sudo micctrl --resetconfig, sudo mpss start.
I spent an extra hour this week on mpss upgrade; the short of the resolution is that none of the convoluted steps in the readme can be skipped, including micctrl --initconfig, followed eventually by the sshconnectivity and resetconfig steps..
Loook at interesting on-line docs, such as:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Great! That solved the issue. Thanks!
Gregg Skinner (Intel) wrote:
Have the cards been configured for networking? You should be able to ssh from one card to the other.
Set up a static or DHCP bridge, as described in the Cluster Setup Guide, which is found in the Intel(R) MPSS docs directory.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Great! That solved the issue. Thanks!
Gregg Skinner (Intel) wrote:
Have the cards been configured for networking? You should be able to ssh from one card to the other.
Set up a static or DHCP bridge, as described in the Cluster Setup Guide, which is found in the Intel(R) MPSS docs directory.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks! Very useful information. BTW, can you elaborate a little bit about how to use sshconnectivity script? e.g. what is the filename the script expects?
TimP (Intel) wrote:
Quote:
Gregg Skinner (Intel)wrote:Have the cards been configured for networking? You should be able to ssh from one card to the other.
Set up a static or DHCP bridge, as described in the Cluster Setup Guide, which is found in the Intel(R) MPSS docs directory.
In particular, there is a useful sshconnectivity script in the unpacked installation directory for Intel MPI. Once the coprocessor is running, this script should be run for each user and for root, followed by sudo service mpss stop; sudo micctrl --resetconfig, sudo mpss start.
I spent an extra hour this week on mpss upgrade; the short of the resolution is that none of the convoluted steps in the readme can be skipped, including micctrl --initconfig, followed eventually by the sshconnectivity and resetconfig steps..
Loook at interesting on-line docs, such as:
http://software.intel.com/sites/default/files/forum/393956/intelr-mpss-f...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
sshconnectivity reads a file which you create containing a list of the names of the nodes for setting up ssh, such as the names from /etc/hosts on all the nodes (omitting IP addresses). I believe it's mentioned briefly in the MPI setup doc.
I have a ptatform on which I have to run network restart on the host first; due apparently to some out of sequencing on initial power up, it doesn't set the correct host IP address, but network restart (or hot reboot) will correct it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you!
TimP (Intel) wrote:
sshconnectivity reads a file which you create containing a list of the names of the nodes for setting up ssh, such as the names from /etc/hosts on all the nodes (omitting IP addresses). I believe it's mentioned briefly in the MPI setup doc.
I have a ptatform on which I have to run network restart on the host first; due apparently to some out of sequencing on initial power up, it doesn't set the correct host IP address, but network restart (or hot reboot) will correct it.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page