Software Archive
Read-only legacy content
17061 Discussions

Issue when installing OFED 3.5-2 and 1.5.4.1

yuyinyang
Beginner
1,606 Views

I'm trying to build OFED-3.5-2-MIC for my 5110p coprocessor. The host OS is CentOS 6.6, Linux kernel version is 2.6.32-504.1.3.el6.x86_64

I run the install.pl with root and get the error when installing intel-mic-ofed-compat-rdma-3.5-OFED.3.5.2.MIC.src.rpm,output info is like this:

-I/usr/src/kernels/2.6.32-504.1.3.el6.x86_64/arch/x86/include \
-Iarch/x86/include/generated \
 -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O2 -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -fstack-protector -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_AVX=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wframe-larger-than=2048 -Wno-unused-but-set-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -pg -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fno-dwarf2-cfi-asm -fconserve-stack  -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(main)"  -D"KBUILD_MODNAME=KBUILD_STR(compat)" -D"DEBUG_HASH=18" -D"DEBUG_HASH2=35" -c -o /var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/compat/.tmp_main.o /var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/compat/main.c
In file included from /var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/include/linux/compat-2.6.h:55,
                 from <command-line>:0:
/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/include/linux/compat-2.6.34.h:19: error: redefinition of typedef 'mmc_pm_flag_t'
include/linux/mmc/pm.h:25: note: previous declaration of 'mmc_pm_flag_t' was here
In file included from /var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/include/linux/compat-2.6.h:58,
                 from <command-line>:0:
/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/include/linux/compat-2.6.37.h:198: error: redeclaration of enumerator 'ETH_FLAG_TXVLAN'
include/linux/ethtool.h:405: note: previous definition of 'ETH_FLAG_TXVLAN' was here
/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/include/linux/compat-2.6.37.h:199: error: redeclaration of enumerator 'ETH_FLAG_RXVLAN'
include/linux/ethtool.h:406: note: previous definition of 'ETH_FLAG_RXVLAN' was here
make[3]: *** [/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/compat/main.o] Error 1
make[2]: *** [/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/compat] Error 2
make[1]: *** [_module_/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5] Error 2
make[1]: Leaving directory `/usr/src/kernels/2.6.32-504.1.3.el6.x86_64'
make: *** [kernel] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.t1Z2c9 (%build)

RPM build errors:
    user build does not exist - using root
    group build does not exist - using root
    user build does not exist - using root
    group build does not exist - using root
    Bad exit status from /var/tmp/rpm-tmp.t1Z2c9 (%build)

It seems to be a compatibility issue of kernel. But I can't find any useful document which clearly clarifies the compatible versions of OFED, MPSS and Linux kernel.

Besides, I also tried to install OFED 1.5.4.1 following the Intel® Manycore Platform Software Stack (Intel® MPSS) guide, the installation seems to be passed, however, the service could not successfully start:

On host:
[root@xeonphi0 mpss-3.4.4]# service openibd start
Setting up InfiniBand network interfaces:                  [  OK  ]
Setting up service network . . .                           [  done  ]
[root@xeonphi0 mpss-3.4.4]# service opensmd start
Starting IB Subnet Manager......                           [FAILED]
[root@xeonphi0 mpss-3.4.4]# service ofed-mic start
Starting OFED Stack:
host   FATAL: Module ibp_server not found.       [FAILED]
mic0                                                                    [  OK  ]
mic1                                                                    [  OK  ]
RTNETLINK answers: File exists
[root@xeonphi0 mpss-3.4.4]# ibstatus
Fatal error:  device '*': sys files not found (/sys/class/infiniband/*/ports)
[root@xeonphi0 mpss-3.4.4]# ibv_devinfo
No IB devices found
[root@xeonphi0 mpss-3.4.4]# ibv_devices
    device                 node GUID
    ------              ---------------- 
 
On Mic:
[root@xeonphi-server-mic0 ~]# ibv_devices
    device                 node GUID
    ------              ----------------
    scif0               4c79bafffe3005d2
[root@xeonphi-server-mic0 ~]# ibv_devinfo
hca_id:    scif0
    transport:            SCIF (2)
    fw_ver:                0.0.1
    node_guid:            4c79:baff:fe30:05d2
    sys_image_guid:            4c79:baff:fe30:05d2
    vendor_id:            0x8086
    vendor_part_id:            0
    hw_ver:                0x1
    phys_port_cnt:            1
        port:    1
            state:            PORT_ACTIVE (4)
            max_mtu:        4096 (5)
            active_mtu:        4096 (5)
            sm_lid:            1
            port_lid:        1001
            port_lmc:        0x00
            link_layer:        SCIF
 
Could anyone kindly offer some help? Thanks in advance.
0 Kudos
9 Replies
Frances_R_Intel
Employee
1,606 Views

One of the people I work with also ran into trouble installing OFED-3.5-2-MIC on a RHEL system that had recently been upgraded to a later kernel. The solution was to install OFED-3.12-1 instead. OFED-3.5-2-MIC, as you could probably tell by the name, is not part of the mainline of OFED development. It is a good idea to get back on the mainline, unless you have a compelling reason to use the -MIC version.

0 Kudos
yuyinyang
Beginner
1,606 Views

I want the scif0 virtual InfiniBand adapter to facilitate communication between a host and an intra-node coprocessor. So will OFED-3.12-1 be OK for my purpose? Or I must use a -MIC version?

0 Kudos
yuyinyang
Beginner
1,606 Views

Hi Roth,

Thanks for your reply.

I want to use the scif0 virtual InfiniBand adapter to facilitate communication between a host and an intra-node coprocessor. So will OFED-3.12-1 be OK for my purpose? Or I must use a -MIC version?

0 Kudos
Frances_R_Intel
Employee
1,606 Views

Yes, OFED-3.12-1 supports the virtual InfiniBand connection. You will have the same advantages using OFED-3.12-1 as you get with OFED-3.5-2-MIC.

0 Kudos
yuyinyang
Beginner
1,606 Views

Hi Roth,

Thanks for your recommendation.

I've successfully reinstalled OFED-3.12-1 with xeon-phi support, currently my environment is MPSS 3.4.4, Linux kernel 2.6.32-358.el6.x86_64, OFED-3.12-1, Intel MPI 5.0.3.048. I tried to use the virtual infiniband functionality. However, I encountered the problems below:

I ran ib_read_bw on server, and then ran ib_read_bw 192.0.2.100 on mic, I got these outputs:

[root@xeonphi-server OFED-3.12-1]# ib_read_bw
---------------------------------------------------------------------------------------
Device not recognized to implement inline feature. Disabling it
************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    RDMA_Read BW Test
 Dual-port       : OFF        Device         : scif0
 Number of qps   : 1        Transport type : IW
 Connection type : RC        Using SRQ      : OFF
 CQ Moderation   : 100
 Mtu             : 4096
 Link type       : Ethernet
 Gid index       : 0
 Outstand reads  : 255
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x3e8 QPN 0x0002 PSN 0xa8d12d OUT 0xff RKey 0x000001 VAddr 0x007f22c60a1000
 GID: 76:121:186:48:05:211:00:00:00:00:00:00:00:00:00:00
ethernet_read_keys: Couldn't read remote address
 Unable to read to socket/rdam_cm
Failed to exchange data between server and clients

[root@xeonphi-server-mic0 micshare]# ib_read_bw 192.0.2.100
---------------------------------------------------------------------------------------
Device not recognized to implement inline feature. Disabling it
---------------------------------------------------------------------------------------
                    RDMA_Read BW Test
 Dual-port       : OFF          Device         : scif0
 Number of qps   : 1            Transport type : Unknown
 Connection type : RC           Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096
 Link type       : SCIF
 Outstand reads  : 255
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
ethernet_read_keys: Couldn't read remote address
 Unable to read from socket/rdam_cm
Failed to exchange data between server and clients

And I also tried to use DAPL fabric to run Intel MPI benchmark, the commands and outputs are like this:

[root@xeonphi-server /tmp]# mpirun -genv I_MPI_DEBUG 2 -host host -n 1 /opt/intel/impi/5.0.3.048/bin64/IMB-MPI1 Sendrecv : -host mic0 -n 1 /tmp/IMB-MPI1
[0] MPI startup(): Multi-threaded optimized library
[0] MPI startup(): RLIMIT_MEMLOCK too small
[0] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
[1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-scif0
[1] MPI startup(): DAPL provider ofa-v2-scif0
[1] MPI startup(): dapl data transfer mode

Some environment info is as below:

[root@xeonphi-server /tmp]# env | grep I_MPI
I_MPI_ROOT=/opt/intel/impi/5.0.3.048
I_MPI_MIC=enable
I_MPI_FABRICS=dapl
I_MPI_DAPL_PROVIDER=ofa-v2-scif0

[root@xeonphi-server tmp]# rpm -qa | grep dapl
dapl-utils-2.1.2-1.x86_64
dapl-devel-static-2.1.2-1.x86_64
dapl-2.1.2-1.x86_64
dapl-debuginfo-2.1.2-1.x86_64
dapl-devel-2.1.2-1.x86_64

I'm quite confused about these weird issues and very looking forward to your reply.

Thank you.

0 Kudos
Mrunal_G_
Beginner
1,606 Views

Hello,

Is there any solution to the above problem that was posted in last post? 

I am facing same issues. The setup is as follows. mpss-3.2.1  kernel - 2.6.38.8    ofed- 3.2.1   CentOS

Server side- (mic)

[root@bricks06-mic0 ~]# ib_read_lat
------------------------------------------------------------------
                    RDMA_Read Latency Test
 Number of qps   : 1
 Connection type : RC
 Mtu             : 2048B
 Link type       : IB
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
------------------------------------------------------------------
 local address: LID 0x01 QPN 0x200049 PSN 0xc86176 OUT 0x10 RKey 0x40003002 VAddr 0x000000006c7000
pp_read_keys: No such file or directory
Couldn't read remote address
 Unable to write to socket/rdam_cm
Failed to exchange date between server and clients

 

Client side - 

[root@bricks06 bin]# ./ib_read_lat -d scif0 192.0.2.101 

---------------------------------------------------------------------------------------

Device not recognized to implement inline feature. Disabling it
ethernet_read_data: Couldn't read reports
 Unable to read from socket/rdam_cm
---------------------------------------------------------------------------------------
                    RDMA_Read Latency Test
 Dual-port       : OFF        Device         : scif0
 Number of qps   : 1        Transport type : IW
 Connection type : RC        Using SRQ      : OFF
 TX depth        : 1
 Mtu             : 4096
 Link type       : IB
 Outstand reads  : 255
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x3e8 QPN 0x0003 PSN 0x295fc7 OUT 0xff RKey 0x000002 VAddr 0x00000001cba000
ethernet_read_keys: Couldn't read remote address
 Unable to read from socket/rdam_cm
Failed to exchange data between server and clients

 

0 Kudos
Mrunal_G_
Beginner
1,606 Views

Happy new year :)

I would really appreciate any help on my last post?

Doe this error indicate that my infiniband set up was not set correctly using verbs? because I can do communication over Ip over Ib here for xeon phi.

Please let me know

thanks much

Mrunal

0 Kudos
yuyinyang
Beginner
1,606 Views

Hi Mrunal,

Sorry for my late reply.

Actually I didn't find any solution to this issue and gave it up afterwards.

I regret that I may not be able to help you :(

0 Kudos
Mrunal_G_
Beginner
1,606 Views

Thanks for your reply.

So you did not at all do a setup on rdma over infiniband?  or did some alternate configuration?

thanks

mrunal

0 Kudos
Reply