Software Archive
Read-only legacy content
17060 Discussions

Installed MPSS with ofed 1.5.4.1 support, not able to get access to xeon phi's over ib

fr33guy
Beginner
580 Views

I installed mpss 3.3 with ofed 1.5.4.1 on my 5110p based server. IB is mlx_4. Node IB is working fine. I am able to login to a remote xeon phi using external bridging support. I am running these services: mpss, openibd, ofed-mic, and mpxyd. I am still not able to access xeon phi over IB (not even on same node).

I have configured ipoib.conf to set the ip of xeon phi to be on the same network as my other nodes. What could be the problem?
No problem connecting xeon phis within single node using scif. Some how i am not able to utilize ccl.

 

System details:

OS: RHEL 6.3
OFED: 1.5.4.1
HCA Driver: mlx_4
MPSS: 3.3 gold update
Xeon phi ROM: updated
Hardware: Supermicro with intel xeon e5 processor with Mellanox HBA

0 Kudos
2 Replies
Frances_R_Intel
Employee
580 Views

I don't have any real experience using IPoIB, although I might be about to gain some. But for starters, can you use InfiniBand verbs at all to access the card? Does ibv_devinfo look right? (You can see an example output at https://software.intel.com/en-us/blogs/2014/05/20/troubleshooting-ofed-issues) I guess what I am trying to ask, is it just IPoIB or are the problems more basic than that?

0 Kudos
Andrey_Vladimirov
New Contributor III
580 Views
  1. Are you sure that you are using "MPSS 3.3 gold update"? MPSS 3.3 is indeed the latest version, but the last public "gold update" was in MPSS 2.1. If you are have a "gold update" version, you may be off by one paragraph in the download page. Here is the correct download location: https://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss#lx33rel
  2. I recently configured IPoIB for Xeon Phi with MPSS 3.x, and the stumbling block for me were firewalls. 
  3. Here are the contents of /etc/mpss/ipoib.conf that worked for my network:
ipoib_enabled=yes
mic0_ib0="10.34.1.21 netmask 255.255.0.0"
mic1_ib0="10.34.1.41 netmask 255.255.0.0"
mic2_ib0="10.34.1.61 netmask 255.255.0.0"
mic3_ib0="10.34.1.81 netmask 255.255.0.0"

Also, I noticed that a visual indicator that service ofed-mic is happy with your /etc/mpss/ipoib.conf is output that contains "ib0" after "mic*" like below. If it does not look like that, something is off in ipoib.conf:

[root@c001-n001 ~]# service ofed-mic start
Starting OFED Stack:
host                                                       [  OK  ]
mic0 - ib0 - mic1 - ib0 - mic2 - ib0 - mic3 - ib0 -        [  OK  ]

Maybe this will help.

0 Kudos
Reply