Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

Issue when installing OFED 3.5-2 and 1.5.4.1

yuyinyang
Beginner
2,847 Views

I'm trying to build OFED-3.5-2-MIC for my 5110p coprocessor. The host OS is CentOS 6.6, Linux kernel version is 2.6.32-504.1.3.el6.x86_64

I run the install.pl with root and get the error when installing intel-mic-ofed-compat-rdma-3.5-OFED.3.5.2.MIC.src.rpm,output info is like this:

-I/usr/src/kernels/2.6.32-504.1.3.el6.x86_64/arch/x86/include \
-Iarch/x86/include/generated \
 -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O2 -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -fstack-protector -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_AVX=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wframe-larger-than=2048 -Wno-unused-but-set-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -pg -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fno-dwarf2-cfi-asm -fconserve-stack  -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(main)"  -D"KBUILD_MODNAME=KBUILD_STR(compat)" -D"DEBUG_HASH=18" -D"DEBUG_HASH2=35" -c -o /var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/compat/.tmp_main.o /var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/compat/main.c
In file included from /var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/include/linux/compat-2.6.h:55,
                 from <command-line>:0:
/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/include/linux/compat-2.6.34.h:19: error: redefinition of typedef 'mmc_pm_flag_t'
include/linux/mmc/pm.h:25: note: previous declaration of 'mmc_pm_flag_t' was here
In file included from /var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/include/linux/compat-2.6.h:58,
                 from <command-line>:0:
/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/include/linux/compat-2.6.37.h:198: error: redeclaration of enumerator 'ETH_FLAG_TXVLAN'
include/linux/ethtool.h:405: note: previous definition of 'ETH_FLAG_TXVLAN' was here
/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/include/linux/compat-2.6.37.h:199: error: redeclaration of enumerator 'ETH_FLAG_RXVLAN'
include/linux/ethtool.h:406: note: previous definition of 'ETH_FLAG_RXVLAN' was here
make[3]: *** [/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/compat/main.o] Error 1
make[2]: *** [/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/compat] Error 2
make[1]: *** [_module_/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5] Error 2
make[1]: Leaving directory `/usr/src/kernels/2.6.32-504.1.3.el6.x86_64'
make: *** [kernel] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.t1Z2c9 (%build)

RPM build errors:
    user build does not exist - using root
    group build does not exist - using root
    user build does not exist - using root
    group build does not exist - using root
    Bad exit status from /var/tmp/rpm-tmp.t1Z2c9 (%build)

It seems to be a compatibility issue of kernel. But I can't find any useful document which clearly clarifies the compatible versions of OFED, MPSS and Linux kernel.

Besides, I also tried to install OFED 1.5.4.1 following the Intel® Manycore Platform Software Stack (Intel® MPSS) guide, the installation seems to be passed, however, the service could not successfully start:

On host:
[root@xeonphi0 mpss-3.4.4]# service openibd start
Setting up InfiniBand network interfaces:                  [  OK  ]
Setting up service network . . .                           [  done  ]
[root@xeonphi0 mpss-3.4.4]# service opensmd start
Starting IB Subnet Manager......                           [FAILED]
[root@xeonphi0 mpss-3.4.4]# service ofed-mic start
Starting OFED Stack:
host   FATAL: Module ibp_server not found.       [FAILED]
mic0                                                                    [  OK  ]
mic1                                                                    [  OK  ]
RTNETLINK answers: File exists
[root@xeonphi0 mpss-3.4.4]# ibstatus
Fatal error:  device '*': sys files not found (/sys/class/infiniband/*/ports)
[root@xeonphi0 mpss-3.4.4]# ibv_devinfo
No IB devices found
[root@xeonphi0 mpss-3.4.4]# ibv_devices
    device                 node GUID
    ------              ---------------- 
 
On Mic:
[root@xeonphi-server-mic0 ~]# ibv_devices
    device                 node GUID
    ------              ----------------
    scif0               4c79bafffe3005d2
[root@xeonphi-server-mic0 ~]# ibv_devinfo
hca_id:    scif0
    transport:            SCIF (2)
    fw_ver:                0.0.1
    node_guid:            4c79:baff:fe30:05d2
    sys_image_guid:            4c79:baff:fe30:05d2
    vendor_id:            0x8086
    vendor_part_id:            0
    hw_ver:                0x1
    phys_port_cnt:            1
        port:    1
            state:            PORT_ACTIVE (4)
            max_mtu:        4096 (5)
            active_mtu:        4096 (5)
            sm_lid:            1
            port_lid:        1001
            port_lmc:        0x00
            link_layer:        SCIF
 
Could anyone kindly offer some help? Thanks in advance.
0 Kudos
9 Replies
Frances_R_Intel
Employee
2,847 Views

One of the people I work with also ran into trouble installing OFED-3.5-2-MIC on a RHEL system that had recently been upgraded to a later kernel. The solution was to install OFED-3.12-1 instead. OFED-3.5-2-MIC, as you could probably tell by the name, is not part of the mainline of OFED development. It is a good idea to get back on the mainline, unless you have a compelling reason to use the -MIC version.

0 Kudos
yuyinyang
Beginner
2,847 Views

I want the scif0 virtual InfiniBand adapter to facilitate communication between a host and an intra-node coprocessor. So will OFED-3.12-1 be OK for my purpose? Or I must use a -MIC version?

0 Kudos
yuyinyang
Beginner
2,847 Views

Hi Roth,

Thanks for your reply.

I want to use the scif0 virtual InfiniBand adapter to facilitate communication between a host and an intra-node coprocessor. So will OFED-3.12-1 be OK for my purpose? Or I must use a -MIC version?

0 Kudos
Frances_R_Intel
Employee
2,847 Views

Yes, OFED-3.12-1 supports the virtual InfiniBand connection. You will have the same advantages using OFED-3.12-1 as you get with OFED-3.5-2-MIC.

0 Kudos
yuyinyang
Beginner
2,847 Views

Hi Roth,

Thanks for your recommendation.

I've successfully reinstalled OFED-3.12-1 with xeon-phi support, currently my environment is MPSS 3.4.4, Linux kernel 2.6.32-358.el6.x86_64, OFED-3.12-1, Intel MPI 5.0.3.048. I tried to use the virtual infiniband functionality. However, I encountered the problems below:

I ran ib_read_bw on server, and then ran ib_read_bw 192.0.2.100 on mic, I got these outputs:

[root@xeonphi-server OFED-3.12-1]# ib_read_bw
---------------------------------------------------------------------------------------
Device not recognized to implement inline feature. Disabling it
************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    RDMA_Read BW Test
 Dual-port       : OFF        Device         : scif0
 Number of qps   : 1        Transport type : IW
 Connection type : RC        Using SRQ      : OFF
 CQ Moderation   : 100
 Mtu             : 4096
 Link type       : Ethernet
 Gid index       : 0
 Outstand reads  : 255
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x3e8 QPN 0x0002 PSN 0xa8d12d OUT 0xff RKey 0x000001 VAddr 0x007f22c60a1000
 GID: 76:121:186:48:05:211:00:00:00:00:00:00:00:00:00:00
ethernet_read_keys: Couldn't read remote address
 Unable to read to socket/rdam_cm
Failed to exchange data between server and clients

[root@xeonphi-server-mic0 micshare]# ib_read_bw 192.0.2.100
---------------------------------------------------------------------------------------
Device not recognized to implement inline feature. Disabling it
---------------------------------------------------------------------------------------
                    RDMA_Read BW Test
 Dual-port       : OFF          Device         : scif0
 Number of qps   : 1            Transport type : Unknown
 Connection type : RC           Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096
 Link type       : SCIF
 Outstand reads  : 255
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
ethernet_read_keys: Couldn't read remote address
 Unable to read from socket/rdam_cm
Failed to exchange data between server and clients

And I also tried to use DAPL fabric to run Intel MPI benchmark, the commands and outputs are like this:

[root@xeonphi-server /tmp]# mpirun -genv I_MPI_DEBUG 2 -host host -n 1 /opt/intel/impi/5.0.3.048/bin64/IMB-MPI1 Sendrecv : -host mic0 -n 1 /tmp/IMB-MPI1
[0] MPI startup(): Multi-threaded optimized library
[0] MPI startup(): RLIMIT_MEMLOCK too small
[0] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
[1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-scif0
[1] MPI startup(): DAPL provider ofa-v2-scif0
[1] MPI startup(): dapl data transfer mode

Some environment info is as below:

[root@xeonphi-server /tmp]# env | grep I_MPI
I_MPI_ROOT=/opt/intel/impi/5.0.3.048
I_MPI_MIC=enable
I_MPI_FABRICS=dapl
I_MPI_DAPL_PROVIDER=ofa-v2-scif0

[root@xeonphi-server tmp]# rpm -qa | grep dapl
dapl-utils-2.1.2-1.x86_64
dapl-devel-static-2.1.2-1.x86_64
dapl-2.1.2-1.x86_64
dapl-debuginfo-2.1.2-1.x86_64
dapl-devel-2.1.2-1.x86_64

I'm quite confused about these weird issues and very looking forward to your reply.

Thank you.

0 Kudos
Mrunal_G_
Beginner
2,847 Views

Hello,

Is there any solution to the above problem that was posted in last post? 

I am facing same issues. The setup is as follows. mpss-3.2.1  kernel - 2.6.38.8    ofed- 3.2.1   CentOS

Server side- (mic)

[root@bricks06-mic0 ~]# ib_read_lat
------------------------------------------------------------------
                    RDMA_Read Latency Test
 Number of qps   : 1
 Connection type : RC
 Mtu             : 2048B
 Link type       : IB
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
------------------------------------------------------------------
 local address: LID 0x01 QPN 0x200049 PSN 0xc86176 OUT 0x10 RKey 0x40003002 VAddr 0x000000006c7000
pp_read_keys: No such file or directory
Couldn't read remote address
 Unable to write to socket/rdam_cm
Failed to exchange date between server and clients

 

Client side - 

[root@bricks06 bin]# ./ib_read_lat -d scif0 192.0.2.101 

---------------------------------------------------------------------------------------

Device not recognized to implement inline feature. Disabling it
ethernet_read_data: Couldn't read reports
 Unable to read from socket/rdam_cm
---------------------------------------------------------------------------------------
                    RDMA_Read Latency Test
 Dual-port       : OFF        Device         : scif0
 Number of qps   : 1        Transport type : IW
 Connection type : RC        Using SRQ      : OFF
 TX depth        : 1
 Mtu             : 4096
 Link type       : IB
 Outstand reads  : 255
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x3e8 QPN 0x0003 PSN 0x295fc7 OUT 0xff RKey 0x000002 VAddr 0x00000001cba000
ethernet_read_keys: Couldn't read remote address
 Unable to read from socket/rdam_cm
Failed to exchange data between server and clients

 

0 Kudos
Mrunal_G_
Beginner
2,847 Views

Happy new year :)

I would really appreciate any help on my last post?

Doe this error indicate that my infiniband set up was not set correctly using verbs? because I can do communication over Ip over Ib here for xeon phi.

Please let me know

thanks much

Mrunal

0 Kudos
yuyinyang
Beginner
2,847 Views

Hi Mrunal,

Sorry for my late reply.

Actually I didn't find any solution to this issue and gave it up afterwards.

I regret that I may not be able to help you :(

0 Kudos
Mrunal_G_
Beginner
2,847 Views

Thanks for your reply.

So you did not at all do a setup on rdma over infiniband?  or did some alternate configuration?

thanks

mrunal

0 Kudos
Reply