Software Archive
Read-only legacy content

CentOS 7.3 fail to build ofed-mic with MPSS-3.8.1

Rolly_N_
Beginner
662 Views

Dear all,

I am trouble installing ofed-mic in CentOS 7.3 with default kernel. 

Linux node09 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

I did my best to follow the installation steps in the MPSS 3.8.1 user guide and the Phi is working okay.

[root@node09 disc]# micctrl --status
mic0: online (mode: linux image: /usr/share/mpss/boot/bzImage-knightscorner)
[root@node09 disc]# miccheck
MicCheck 3.8.1-1
Copyright (c) 2016, Intel Corporation.

Executing default tests for host
  Test 0: Check number of devices the OS sees in the system ... pass
  Test 1: Check mic driver is loaded ... pass
  Test 2: Check number of devices driver sees in the system ... pass
  Test 3: Check mpssd daemon is running ... pass
Executing default tests for device: 0
  Test 4 (mic0): Check device is in online state and its postcode is FF ... pass
  Test 5 (mic0): Check ras daemon is available in device ... pass
  Test 6 (mic0): Check running flash version is correct ... pass
  Test 7 (mic0): Check running SMC firmware version is correct ... pass

Status: OK

[root@node09 disc]# micinfo
MicInfo Utility Log
Created Thu Mar  2 14:07:33 2017


        System Info
                HOST OS                 : Linux
                OS Version              : 3.10.0-514.el7.x86_64
                Driver Version          : 3.8.1-1
                MPSS Version            : 3.8.1

                Host Physical Memory    : 257661 MB

Device No: 0, Device Name: mic0

        Version
                Flash Version            : 2.1.02.0391
                SMC Firmware Version     : 1.17.6900
                SMC Boot Loader Version  : 1.8.4326
                Coprocessor OS Version   : 2.6.38.8+mpss3.8.1
                Device Serial Number     : ADKC33400518

        Board
                Vendor ID                : 0x8086
                Device ID                : 0x225c
                Subsystem ID             : 0x7d95
                Coprocessor Stepping ID  : 2
                PCIe Width               : x16
                PCIe Speed               : 5 GT/s
                PCIe Max payload size    : 256 bytes
                PCIe Max read req size   : 512 bytes
                Coprocessor Model        : 0x01
                Coprocessor Model Ext    : 0x00
                Coprocessor Type         : 0x00
                Coprocessor Family       : 0x0b
                Coprocessor Family Ext   : 0x00
                Coprocessor Stepping     : C0
                Board SKU                : C0PRQ-7120 P/A/X/D
                ECC Mode                 : Enabled
                SMC HW Revision          : Product 300W Passive CS

        Cores
                Total No of Active Cores : 61
                Voltage                  : 0 uV
                Frequency                : 1238095 kHz

        Thermal
                Fan Speed Control        : N/A
                Fan RPM                  : N/A
                Fan PWM                  : N/A
                Die Temp                 : 45 C

        GDDR
                GDDR Vendor              : Samsung
                GDDR Version             : 0x6
                GDDR Density             : 4096 Mb
                GDDR Size                : 15872 MB
                GDDR Technology          : GDDR5
                GDDR Speed               : 5.500000 GT/s
                GDDR Frequency           : 2750000 kHz
                GDDR Voltage             : 1501000 uV

I can also ssh to mic0 and back to the host by following this page https://software.intel.com/en-us/node/544138

Now, I would like to setup the OFED support on the Phi, but I found no solution so far. As suggested by the user guide, it works with OFED-3.18.2 and Mellanox MLNX-OFED 2.4.

http://registrationcenter-download.intel.com/akdlm/irc_nas/11194/mpss_users_guide.pdf

But, when I tried to compile OFED-3.18.2, it produced errors on

/var/tmp/OFED_topdir/BUILD/compat-rdma-3.18/include/linux/compat-3.16.h:25:19: error: redefinition of 'ktime_get_ns'

When I tried on MLNX-OFED 2.4, it said my OS is not supported. So, I tried the MLNX-OFED 3.4, but as I rebuild the rpms, it failed with 

/root/rpmbuild/BUILD/ofed-driver/drivers/infiniband/ibp/cm/cm_server_msg.c:986:19: error: 'IB_QP_SMAC' undeclared (first use in this function)

So, I cannot enable ofed-mic service in my node.

Can anyone help?

Thanks,

Rolly

0 Kudos
3 Replies
Loc_N_Intel
Employee
662 Views

Hi Rolly,

I am able to reproduce the problem you saw when installing OFED-3.18-2 on a host system running RHEL 7.3.and MPSS 3.8.1 . Let me check with a MPSS/OFED expert here. I will get back to you.

Thanks 

0 Kudos
Rolly_N_
Beginner
662 Views

Hello Nguyen,

Thanks for your attention.

I can confirm that CentOS 7.1 works fine with OFED-3.18.2 and MPSS 3.8.1.

[qeuser@node09 ~]$ uname -a
Linux node09 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

I can the ibv_devinfo

[root@node09 qeuser]# ibv_devinfo
hca_id: scif0
        transport:                      invalid transport (-1)
        fw_ver:                         0.0.1
        node_guid:                      4c79:baff:fe44:040d
        sys_image_guid:                 4c79:baff:fe44:040d
        vendor_id:                      0x8086
        vendor_part_id:                 0
        hw_ver:                         0x1
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 1
                        port_lid:               1000
                        port_lmc:               0x00
                        link_layer:             Unknown

hca_id: mlx4_1
        transport:                      InfiniBand (0)
        fw_ver:                         2.40.5030
        node_guid:                      f452:1403:0042:f570
        sys_image_guid:                 f452:1403:0042:f573
        vendor_id:                      0x02c9
        vendor_part_id:                 4099
        hw_ver:                         0x0
        board_id:                       MT_1090110018
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_DOWN (1)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             InfiniBand

                port:   2
                        state:                  PORT_DOWN (1)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             InfiniBand

hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.40.5030
        node_guid:                      f452:1403:0042:e710
        sys_image_guid:                 f452:1403:0042:e713
        vendor_id:                      0x02c9
        vendor_part_id:                 4099
        hw_ver:                         0x0
        board_id:                       MT_1090110018
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_DOWN (1)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             InfiniBand

                port:   2
                        state:                  PORT_DOWN (1)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             InfiniBand

[root@node09 qeuser]# 

I will plugin the infiniband cables and check the status again.

Best,

Rolly

0 Kudos
Loc_N_Intel
Employee
662 Views

Hi Rolly,

According to OFED-3.18-2 release notes (OFED-3.18-2/docs/OFED_release_notes.txt), this OFED version supports RHEL 6.5, 6.6, 6.7, 7.0, 7.1, 7.2, SLES 11 SP3 and SP4, SLES 12 and 12.1 only.

RHEL 7.3/CentOS 7.3 are fairly recent, and they are not supported by OFED-3.18-2. But CentOS 7.1 should work with OFED-3.18.2.

Thanks

0 Kudos
Reply