Software Archive
Read-only legacy content
17061 Discussions

SLES11sp3 + MPSS 3.4.3 + MOFED 2.3.2.0.0.1 IB scif issues

Geert_G_
Beginner
2,158 Views

Hello,

We have a number of iDataPlex dx360 M4 Server machines with each 2 Xeon Phi Coprocessor 5110P cards, and one Mellanox ConnectX-3 card. We're running SLES 11sp3 linux on these machines using a 3.0.101-0.40-default kernel. I've installed mpss 3.4.3, updated the firmware and almost everything seems to function.

The only problem I encounter is with infiniband.

I can successfully start openibd on the host. Successfully start ofed-mic on the host (which successfully starts ofed on the mics as well). Although a "RTNETLINK answers: File exists" warning is returned from starting ofed-mic on the host. (This remains also a unresolved, maybe related issue)

I've configured ipoib on the mics, and the assigned ips are reachable trough the fabric. 

I can run ibv_rc_pingpong to a mic from any host in the fabric, including the second mic on the host, except from the host it self which contains the mic card. The same counts for ibv_uc_pingpong.

ibv_ud_pingpong comes till address exchange, but hangs after this.

I don't know how to debug this issue any further and I'm unfortunately not enough of a ofed/scif guru to know what is going wrong here.

Any help/pointers would be appreciated.

Best regards,

Geert

0 Kudos
1 Solution
Loc_N_Intel
Employee
2,158 Views

Here is the answer I got:

In the case of ibv_ud_pingpong you need to specify –d switch (IB device) to select the same IB device on each side, in this case is scif0. You also need to specify –s switch (message size) which may be different between host and coprocessor side.

 

For example, start host as server:

Host> ibv_ud_pingpong –s 2048 –d scif0

 

In the coprocessor:

Mic0> ibv_ud_pingpong –s 2048 –d scif0 host

 

Or start the coprocessor as server:

Mic0> ibv_ud_pingpong –s 2048 –d scif0

 

In the host:

Host> ibv_ud_pingpong –s 2048 –d scif0 mic0

View solution in original post

0 Kudos
8 Replies
Loc_N_Intel
Employee
2,158 Views

Hi Geert,

Do you follow the instructions in the MPSS User's Guide on IPoIB configuration (MPSS 3.5, Section 5.3)? Can you provide more details on your setup please? I will try to reproduce the problem you see on my system equipped with Mellanox HCAs here. Thank you. 

0 Kudos
Geert_G_
Beginner
2,158 Views

Hi loc-nguyen!

Thanks for your reply.

I have configured ipoib on the mic.

root@hai0012-mic0:~# ifconfig ib0 ib0 Link encap:InfiniBand HWaddr 80-04-02-19-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:10.64.111.55 Bcast:10.64.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:4092 Metric:1 RX packets:551 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:132354 (129.2 KiB) TX bytes:0 (0.0 B)

The ip is also reachable throughout the fabric.

Could you please tell me witch details your expecting of me?

One additition though, I've also contacted Mellanox about this issue, and they adviced me to upgrade MOFED to the latest MLNX_OFED_LINUX-2.4-1.0.4, but this has not changed anything to my issue.

Further more I've been looking at the spec of the ofed-driver.src.rpm, and it seems like it checks for MOFED, which is defined with the rpmbuild command(rpmbuild --define "MOFED 1"), and after that is a check for MOFED 2.1. I was wondering, if the patches shouldn't be applied to MOFED 2.1+, instead of only MOFED 2.1?

Best regards,

Geert

0 Kudos
Geert_G_
Beginner
2,158 Views

OK, did some more researching, and I can now pinpoint the problem to a ibscif driver related problem.
I've send a list of pingpong tests + results to Mellanox, and they advised me to contact Intel as this ibscif driver is out of their hands.
I've copy pasted the tests + result below here:

== hpa0033 == (normal compute node)
- IBM System X iDataPlex dx360 M4 Server
- SLES11SP3
- MLNX_OFED_LINUX-2.4-1.0.4
- 3.0.101-0.40-default
- ip eth1: 10.32.1.41
- ip ib0: 10.64.1.41

== hai0012 == (accellerator compute node)
- IBM System X iDataPlex dx360 M4 Server
- SLES11SP3
- MLNX_OFED_LINUX-2.4-1.0.4
- 3.0.101-0.40-default
- mpss 3.4.3 (additional hosts: hai0012-mic0, hai0012-mic1)
- ip br0 (eth1,mic0,mic1): 10.32.11.55
- ip ib0: 10.64.11.55
== hai0012-mic0 == (accelerator node)
- Xeon Phi Coprocessor 5110P
- Intel MIC Platform Software Stack (Built by Poky 7.0) 3.4.3 \n \l
- ofed-driver-3.4.3-1.knightscorner
- ofed-pt-1.3.0+git1+8f8243549f-r0.k1om
- 2.6.38.8+mpss3.4.3
- ip: 10.32.111.55
- ip ib0: 10.64.111.55 (pingable throughout the fabric)
== hai0012-mic1 == (accelerator node)
- Xeon Phi Coprocessor 5110P
- Intel MIC Platform Software Stack (Built by Poky 7.0) 3.4.3 \n \l
- ofed-driver-3.4.3-1.knightscorner
- ofed-pt-1.3.0+git1+8f8243549f-r0.k1om
- 2.6.38.8+mpss3.4.3
- ip: 10.32.211.55
- ip ib0: 10.64.211.55 (pingable throughout the fabric)
===============
Some pingpong tests WITHOUT "service ofed-mic" running @hai0012

1) RC_pingpong
hai0012:~ # ibv_rc_pingpong
local address: LID 0x02a3, QPN 0x040277, PSN 0xd59c9f, GID ::
remote address: LID 0x002f, QPN 0x04027a, PSN 0x847213, GID ::
8192000 bytes in 0.01 seconds = 7627.56 Mbit/sec
1000 iters in 0.01 seconds = 8.59 usec/iter

hpa0033:~ # ibv_rc_pingpong hai0012
local address: LID 0x002f, QPN 0x04027a, PSN 0x847213, GID ::
remote address: LID 0x02a3, QPN 0x040277, PSN 0xd59c9f, GID ::
8192000 bytes in 0.01 seconds = 7718.29 Mbit/sec
1000 iters in 0.01 seconds = 8.49 usec/iter

2) UC_pingpong
hai0012:~ # ibv_uc_pingpong
local address: LID 0x02a3, QPN 0x040278, PSN 0xd6e7ae, GID ::
remote address: LID 0x002f, QPN 0x04027b, PSN 0x6b45bb, GID ::
8192000 bytes in 0.01 seconds = 7685.70 Mbit/sec
1000 iters in 0.01 seconds = 8.53 usec/iter

hpa0033:~ # ibv_uc_pingpong hai0012
local address: LID 0x002f, QPN 0x04027b, PSN 0x6b45bb, GID ::
remote address: LID 0x02a3, QPN 0x040278, PSN 0xd6e7ae, GID ::
8192000 bytes in 0.01 seconds = 7778.75 Mbit/sec
1000 iters in 0.01 seconds = 8.43 usec/iter

3) UD_pingpong
hai0012:~ # ibv_ud_pingpong
local address: LID 0x02a3, QPN 0x040279, PSN 0xcfeefe: GID ::
remote address: LID 0x002f, QPN 0x04027c, PSN 0xd7d2b5, GID ::
2048000 bytes in 0.01 seconds = 2264.55 Mbit/sec
1000 iters in 0.01 seconds = 7.24 usec/iter

hpa0033:~ # ibv_ud_pingpong hai0012
local address: LID 0x002f, QPN 0x04027c, PSN 0xd7d2b5: GID ::
remote address: LID 0x02a3, QPN 0x040279, PSN 0xcfeefe, GID ::
2048000 bytes in 0.01 seconds = 2298.22 Mbit/sec
1000 iters in 0.01 seconds = 7.13 usec/iter

All pingpong tests succeed!
Now we start the ofed-mic service on the hai0012:

1)RC_pingpong
hai0012:~ # ibv_rc_pingpong
local address: LID 0x03e8, QPN 0x000002, PSN 0x0c8aae, GID ::
remote address: LID 0x002f, QPN 0x04027d, PSN 0x158465, GID ::
poll CQ failed -38

hpa0033:~ # ibv_rc_pingpong hai0012
local address: LID 0x002f, QPN 0x04027d, PSN 0x158465, GID ::
remote address: LID 0x03e8, QPN 0x000002, PSN 0x0c8aae, GID ::
Failed status transport retry counter exceeded (12) for wr_id 2

2)UC_pingpong
hai0012:~ # ibv_uc_pingpong
Couldn't create QP

hpa0033:~ # ibv_uc_pingpong hai0012
local address: LID 0x002f, QPN 0x04027f, PSN 0xe7e87a, GID ::
Couldn't connect to hai0012:18515

3)UD_pingpong
hai0012:~ # ibv_ud_pingpong
local address: LID 0x03e8, QPN 0x000002, PSN 0x7329a4: GID ::
remote address: LID 0x002f, QPN 0x04027e, PSN 0x4d175e, GID ::
^C

hpa0033:~ # ibv_ud_pingpong hai0012
local address: LID 0x002f, QPN 0x04027e, PSN 0x4d175e: GID ::
remote address: LID 0x03e8, QPN 0x000002, PSN 0x7329a4, GID ::
^C

All pingpong tests FAILED!!
However... the same pingpong tests from the mic0 card:

1)RC_pingpong
root@hai0012-mic0:~# ibv_rc_pingpong
local address: LID 0x02a3, QPN 0x04027c, PSN 0x97d187, GID ::
remote address: LID 0x002f, QPN 0x040280, PSN 0x45cea3, GID ::
8192000 bytes in 0.01 seconds = 4633.48 Mbit/sec
1000 iters in 0.01 seconds = 14.14 usec/iter

hpa0033:~ # ibv_rc_pingpong hai0012-mic0
local address: LID 0x002f, QPN 0x040280, PSN 0x45cea3, GID ::
remote address: LID 0x02a3, QPN 0x04027c, PSN 0x97d187, GID ::
8192000 bytes in 0.01 seconds = 4383.68 Mbit/sec
1000 iters in 0.01 seconds = 14.95 usec/iter

2)UC_pingpong
root@hai0012-mic0:~# ibv_uc_pingpong
local address: LID 0x02a3, QPN 0x04027d, PSN 0xd2cb8e, GID ::
remote address: LID 0x002f, QPN 0x040281, PSN 0x88fe9e, GID ::
8192000 bytes in 0.01 seconds = 4664.82 Mbit/sec
1000 iters in 0.01 seconds = 14.05 usec/iter

hpa0033:~ # ibv_uc_pingpong hai0012-mic0
local address: LID 0x002f, QPN 0x040281, PSN 0x88fe9e, GID ::
remote address: LID 0x02a3, QPN 0x04027d, PSN 0xd2cb8e, GID ::
8192000 bytes in 0.02 seconds = 4357.16 Mbit/sec
1000 iters in 0.02 seconds = 15.04 usec/iter

3) UD_pingpong
root@hai0012-mic0:~# ibv_ud_pingpong
local address: LID 0x02a3, QPN 0x04027e, PSN 0x09d584: GID ::
remote address: LID 0x002f, QPN 0x040282, PSN 0xb8cb4e, GID ::
^C

hpa0033:~ # ibv_ud_pingpong hai0012-mic0
local address: LID 0x002f, QPN 0x040282, PSN 0xb8cb4e: GID ::
remote address: LID 0x02a3, QPN 0x04027e, PSN 0x09d584, GID ::
^C

The RC and UC pingpong tests do work on the mics. (both hai0012-mic0 and hai0012-mic1 give the same results)
The UD pingpong fails on both the mics as well as the host hai0012.

On the hai0012 dmesg shows multiple of the following errors: (complete dmesg output attached)
[95603.049744] CCL Direct Server v1.0
[95603.049745] Copyright (c) 2011-2013 Intel Corporation
[95603.052008] CCL Direct CM Server v1.0
[95603.052009] Copyright (c) 2011-2013 Intel Corporation
[95603.054022] CCL Direct SA Server v1.0
[95603.054023] Copyright (c) 2011-2013 Intel Corporation
[95603.057987] ibscif: OpenFabrics IBSCIF Driver v0.1 Build 3.4.3 built Apr 28 2015 08:58:13
[95603.057990] ibscif: max_pinned=50, window_size=40, blocking_send=0, blocking_recv=1, fast_rdma=1, host_proxy=0, rma_threshold=1024, scif_loopback=1, new_ib_type=1, verbose=0
[95603.058017] ibscif_add_one: my node_id is 0
[96394.658703] ibscif_get_conn: ERROR: cannot get connection (0-->64583) after waiting, state=-1
[96394.658872] ibscif: ibscif_send_disconnect: ERROR: qp->conn == NULL

A similar issue has been reported before at: https://software.intel.com/en-us/forums/topic/508661
But to my understanding this due to something not being correctly unpacked at initrd execution time. The error "Initramfs unpacking failed: junk in compressed archive" is not shown in our dmesg.
I hope someone can give some insight in how to resolve this issue!

Best regards,
Geert

0 Kudos
Geert_G_
Beginner
2,158 Views

Small update....

Apparently ibv_ud_pingpong does seem to work from mic to mic! Didn't test that initially as I thought it would surely fail as host->mic didn't work.

Anyways, the results:

root@hai0012-mic1:~# ibv_ud_pingpong
local address: LID 0x02a3, QPN 0x04025b, PSN 0x92ea14: GID ::

root@hai0012-mic0:~# ibv_ud_pingpong hai0012-mic1
local address: LID 0x02a3, QPN 0x04025c, PSN 0x54ddf2: GID ::
remote address: LID 0x02a3, QPN 0x04025b, PSN 0x92ea14, GID ::
4096000 bytes in 0.02 seconds = 1450.23 Mbit/sec
1000 iters in 0.02 seconds = 22.59 usec/iter

Surprisingly also host mic -> different host mic works:
[root@hai0013-mic0 ~]# ibv_ud_pingpong
local address: LID 0x02a9, QPN 0x040258, PSN 0x5374df: GID ::

root@hai0012-mic0:~# ibv_ud_pingpong hai0013-mic0
local address: LID 0x02a3, QPN 0x04025d, PSN 0x689680: GID ::
remote address: LID 0x02a9, QPN 0x040258, PSN 0x5374df, GID ::
4096000 bytes in 0.01 seconds = 2283.64 Mbit/sec
1000 iters in 0.01 seconds = 14.35 usec/iter
root@hai0012-mic0:~#

0 Kudos
Loc_N_Intel
Employee
2,158 Views

Hello Geert,

Sorry for the delay. I did try but for some reasons, IPoIB configuration on my system did not work. However, I escalate the issue you see to OFED experts here.  Thank you.

0 Kudos
Loc_N_Intel
Employee
2,159 Views

Here is the answer I got:

In the case of ibv_ud_pingpong you need to specify –d switch (IB device) to select the same IB device on each side, in this case is scif0. You also need to specify –s switch (message size) which may be different between host and coprocessor side.

 

For example, start host as server:

Host> ibv_ud_pingpong –s 2048 –d scif0

 

In the coprocessor:

Mic0> ibv_ud_pingpong –s 2048 –d scif0 host

 

Or start the coprocessor as server:

Mic0> ibv_ud_pingpong –s 2048 –d scif0

 

In the host:

Host> ibv_ud_pingpong –s 2048 –d scif0 mic0

0 Kudos
Geert_G_
Beginner
2,158 Views

Hello loc-nguyen!

That indeed seems to work...

Looking further into why I've never tried that before, I noticed the following: ibv_devinfo doesn't show a scif0 device @host. The device does however exist in /sys/class/infiniband/scif0... That would also explain the funny ibv_devinfo "Failed to query device props" output.

Thanks for you're help.

Best regards, Geert

0 Kudos
Loc_N_Intel
Employee
2,158 Views

Hi Geert,

I am glad that works for you. Regarding the ibv_devinfo command, even there is a query error on the host, you should see the IB device scif0 on the coprocessor:

# ssh mic0 ibv_devinfo

For your information, I use OFED 3.12-1 (http://downloads.openfabrics.org/OFED/ofed-3.12-1/ ) and I am able to see the IB device scif0 listed on both host and coprocessor.  

Thank you.

0 Kudos
Reply