Community
cancel
Showing results for 
Search instead for 
Did you mean: 
George_C_5
Beginner
115 Views

Unable to access the KNC card

Jump to solution

Dear all,

I have a server running CentOS Linux release 7.2.1511, equipped with a KNC MIC card,
using MPSS version 3.3 in order to access and manage the card. The server has been
running for about 2 years without issues. Recently however, I am unable to ssh into the
MIC card. The problem persists after rebooting the entire node. Trying to reboot just the
card gives the following output:

$ sudo micctrl --reboot
Error getting SCIF driver version 
  [Error] mic0 failed to shutdown: card state (null)
  [Error] mic0: cannot wait for non existent MIC device
  [Error] mic0: Boot aborted - Setting kernel command line failed
  [Error] mic0: cannot wait for non existent MIC device
$ sudo micctrl --status
Error getting SCIF driver version 
  [Error] mic0: cannot find state of non existent MIC device

The MPSS service, however, seems to be working ok:

$ sudo service mpss status
mpss is running

The card is also recognized as existing on the PCI bus:

$ sudo lspci -v
...
02:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 3120 series (rev 20)
    Subsystem: Intel Corporation Device 3608
    Flags: fast devsel, IRQ 11
    Memory at 380c00000000 (64-bit, prefetchable) [disabled] [size=8G]
    Memory at fb600000 (64-bit, non-prefetchable) [disabled] [size=128K]
    Capabilities: [44] Power Management version 3
    Capabilities: [4c] Express Endpoint, MSI 00
    Capabilities: [88] MSI: Enable- Count=1/16 Maskable- 64bit+
    Capabilities: [98] MSI-X: Enable- Count=16 Masked-
    Capabilities: [100] Advanced Error Reporting
...

Finally, the miccheck command reports a driver-related issue:

$ sudo miccheck
MicCheck 3.3-r1
Copyright 2013 Intel Corporation All Rights Reserved
Executing default tests for host
  Test 0: Check number of devices the OS sees in the system ... pass
  Test 1: Check mic driver is loaded ... fail
    mic driver not loaded
Status: FAIL
Failure: mic driver not loaded

It is obviously a driver-related issue, however we have not "touched" the driver in a
very long time and MPSS seems to be working fine. Isn't the SCIF driver part of the
MPSS service? There seems to be an issue with this driver in particular.

We could definitely solve this issue by re-installing MPSS as a whole and updating
it as well (3.3 is quite old by now), however this is a time-consuming and potentially
volatile process, backwards-compatibility being an issue. Is there some other way
of solving this?

Thank you for your time reading this post.
I eagerly await your responses,

George Ch.

0 Kudos
1 Solution
JJK
New Contributor III
115 Views

did you install the mpss-modules rpm for the RHEL 7.2 kernel? is there a module present in /lib/modules/3.10.0-327.el7/extra ?

what happens if you try to insert that module directly:

insmod /lib/modules/3.10.0-327.el7/extra/mic.ko

?  what does 
modinfo mic

return? 

 

View solution in original post

8 Replies
JJK
New Contributor III
115 Views

mpss 3.3 is out of support for a very long time; my guess is that you got lucky that the mic driver was working with the recent kernel updates. What happens when you do a

modprobe mic

?  what do you seen with 'dmesg' after that?

Other than that, upgrading the mpss stack to 3.7.2 is fairly painless - but it depends on your setup, of course. If you're installing k1om rpms on the mic then remember to upgrade them as well.

 

George_C_5
Beginner
115 Views

Executing the following command:

modprobe mic

results in the following output:

$ modprobe mic
modprobe: FATAL: Module mic not found.

The dmesg command prints a lot of output, none of which seems to be relevant to the Phi
with a simple grep command:

$ dmesg | grep Phi
$ dmesg | grep coprocessor

Upgrading to a newer version of mpss seems unavoidable.
I apologize if I sound ignorant, but how do I check for my k1om installation?
I do seem to have a relevant directory:

/opt/mpss/3.3/sysroots/x86_64-mpsssdk-linux/usr/libexec/k1om-mpss-linux

Can you give some tips concerning the update of k1om (and perhaps mpss as well)?
Thank you for your reply!

JJK
New Contributor III
115 Views

most likely your newer kernel does not have a link to the mic.ko module; you can try adding a weak-updates link for the mic.ko module using

cd /lib/modules/`uname -r`/weak-updates
ln -fs /lib/modules/3.10.0-123.el7.x86_64/extra/mic.ko
depmod -ae

modprobe mic

if that does not work then you'll have to upgrade to mpss 3.7.2 ; if you did not install any MIC-side RPMs (which is specified in the mic configuration files in /etc/mpss/....) then you do not need to worry about the k1om rpms.

 

 

George_C_5
Beginner
115 Views

Dear JJK,

I am very thankful for your help. Establishing the weak link, as you proposed, did not help much. Thus, I moved on to the long-term solution of upgrading my mpss software. I tried to follow this guide by the book: http://registrationcenter-download.intel.com/akdlm/irc_nas/9669/readme.txt


Now, this process did not go exactly as planned. I was completely unable to locate several important files that should have been included in the mpss version that was installed - such as the uninstall script and some module files that needed recompiling. I attributed this to having an even older mpss version than 3.3.5 - seemingly a 3.3.1 version. My solution to this problem was actually downloading the online 3.3.5 mpss distribution, re-compiling those mpss modules and running that uninstall scipt. My solution seemingly worked.

I then continued with the installation of the new 3.7.2 mpss software. I followed every step of the algorithm applicable for Red Hat 7.2. The software installed without any issues, following the guide that I mentioned religiously. I arrived at the step of actually checking the mic card before updating its flash files. The problem, unfortunately, persists:

$ modprobe mic
modprobe: FATAL: Module mic not found.
$ sudo micctrl --initdefaults
$ micctrl -s
  [Error] mic0: State failed - non existent MIC device
$ sudo micctrl -rw
[sudo] password for georgec: 
  [Error] mic0: Reset aborted - non existent MIC device
  [Error] mic0: Wait failed - non existent MIC device

Thus, updating the driver, at least host-side, did not solve my issues. Furthermore, I cannot proceed with the completion of the upgrade on the side of Phi.

JJK
New Contributor III
116 Views

did you install the mpss-modules rpm for the RHEL 7.2 kernel? is there a module present in /lib/modules/3.10.0-327.el7/extra ?

what happens if you try to insert that module directly:

insmod /lib/modules/3.10.0-327.el7/extra/mic.ko

?  what does 
modinfo mic

return? 

 

View solution in original post

George_C_5
Beginner
115 Views

I removed the previous installation of MPSS (3.3) via the uninstall.sh script that I had to download from https://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss#lx37rel
I then downloaded, extracted, copied and installed MPSS 3.7 modules:

$ tar xvf mpss-3.7.2-linux.tar
$ cd mpss-3.7.2
$ uname -r
3.10.0-327.28.3.el7.x86_64
$ cp modules/mpss-modules-3.10.0-327.el7.x86_64-3.7.2-1.x86_64.rpm .
$ cp modules/mpss-modules-dev-3.10.0-327.el7.x86_64-3.7.2-1.x86_64.rpm .
$ sudo yum install *.rpm

As for the module that you ask:

$ ls /lib/modules/3.10.0-327.el7.x86_64/extra/
mic.ko

Now for the good news: trying your insertion command fixed everything!

​
$ sudo insmod /lib/modules/3.10.0-327.el7.x86_64/extra/mic.ko 
[sudo] password for georgec: 
$

The command worked and I was able to continue the installation (updating flash and SMC).
Everything is working as it should be now and I am able to ssh into the card.
There is only one small detail that maybe you can shed some light onto:
I need to run the insertion command everytime I boot the host, otherwise the mic card remains undetected.
Any ideas on how to fix this last aspect of the issue in a permanent fashion? I could just include the command on
a boot script, but is there some other way to fix it?

Thank you JJK, you have been immensely helpful.

JJK
New Contributor III
115 Views

Good, now we're getting somewhere.

The next step is to ensure that the mic.ko module is found at boot time. Try the following

cd /lib/modules/`uname -r`/weak-updates
ln -fs /lib/modules/3.10.0-327.el7.x86_64/extra/mic.ko
depmod -ae

modinfo mic

This will create the right "weak-updates" link for the mic.ko module. Afterwards, the mic.ko module should be loadable using

rmmod mic
modprobe mic

and the mpss stack will automatically insert this module at startup.

 

George_C_5
Beginner
115 Views

Thank you JJK, the process you described removed the need to manually call insmod at server booting.

$ modinfo mic
filename:       /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/mic.ko
license:        GPL
build_scmver:   2659671e21e0814014e442998fbdc8ff37d1c68e
build_ondate:   2016-08-09 15:39:20 -0400
build_bywhom:   qb_user@sid-bld06.pdx.intel.com
build_number:   0
license:        GPL
license:        GPL
rhelversion:    7.2
srcversion:     35F362554621AC4E6F74424
depends:        
vermagic:       3.10.0-327.el7.x86_64 SMP mod_unload modversions 
parm:           vnet:Vnet operating mode, one of: poll intr dma (vnetmode)
parm:           vnet_num_buffers:Number of buffers used by the VNET driver (int)
parm:           vnet_addr:Vnet driver host ring address (ulong)
parm:           ulimit:SCIF ulimit check (bool)
parm:           reg_cache:SCIF registration caching (bool)
parm:           huge_page:SCIF Huge Page Support (bool)
parm:           p2p:SCIF peer-to-peer (bool)
parm:           p2p_proxy:SCIF peer-to-peer proxy DMA support (bool)
parm:           watchdog:SCIF Watchdog (bool)
parm:           watchdog_auto_reboot:SCIF Watchdog auto reboot (bool)
parm:           msi:bool
parm:           mic_msi_enable:To enable MSIx in the driver.
parm:           pm_qos_cpu_dma_lat:int
parm:           mic_pm_qos_cpu_dma_lat:PM QoS CPU DMA latency in usecs.
parm:           ramoops_count:Maximum frame count for the ramoops driver. (int)
parm:           crash_dump:bool
parm:           mic_crash_dump_enabled:MIC Crash Dump enabled.
parm:           psmi:Enable/disable mic psmi (bool)

Thank you for your help :)

Reply