- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I have a server running CentOS Linux release 7.2.1511, equipped with a KNC MIC card,
using MPSS version 3.3 in order to access and manage the card. The server has been
running for about 2 years without issues. Recently however, I am unable to ssh into the
MIC card. The problem persists after rebooting the entire node. Trying to reboot just the
card gives the following output:
$ sudo micctrl --reboot Error getting SCIF driver version [Error] mic0 failed to shutdown: card state (null) [Error] mic0: cannot wait for non existent MIC device [Error] mic0: Boot aborted - Setting kernel command line failed [Error] mic0: cannot wait for non existent MIC device $ sudo micctrl --status Error getting SCIF driver version [Error] mic0: cannot find state of non existent MIC device
The MPSS service, however, seems to be working ok:
$ sudo service mpss status mpss is running
The card is also recognized as existing on the PCI bus:
$ sudo lspci -v ... 02:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 3120 series (rev 20) Subsystem: Intel Corporation Device 3608 Flags: fast devsel, IRQ 11 Memory at 380c00000000 (64-bit, prefetchable) [disabled] [size=8G] Memory at fb600000 (64-bit, non-prefetchable) [disabled] [size=128K] Capabilities: [44] Power Management version 3 Capabilities: [4c] Express Endpoint, MSI 00 Capabilities: [88] MSI: Enable- Count=1/16 Maskable- 64bit+ Capabilities: [98] MSI-X: Enable- Count=16 Masked- Capabilities: [100] Advanced Error Reporting ...
Finally, the miccheck command reports a driver-related issue:
$ sudo miccheck MicCheck 3.3-r1 Copyright 2013 Intel Corporation All Rights Reserved Executing default tests for host Test 0: Check number of devices the OS sees in the system ... pass Test 1: Check mic driver is loaded ... fail mic driver not loaded Status: FAIL Failure: mic driver not loaded
It is obviously a driver-related issue, however we have not "touched" the driver in a
very long time and MPSS seems to be working fine. Isn't the SCIF driver part of the
MPSS service? There seems to be an issue with this driver in particular.
We could definitely solve this issue by re-installing MPSS as a whole and updating
it as well (3.3 is quite old by now), however this is a time-consuming and potentially
volatile process, backwards-compatibility being an issue. Is there some other way
of solving this?
Thank you for your time reading this post.
I eagerly await your responses,
George Ch.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
did you install the mpss-modules rpm for the RHEL 7.2 kernel? is there a module present in /lib/modules/3.10.0-327.el7/extra ?
what happens if you try to insert that module directly:
insmod /lib/modules/3.10.0-327.el7/extra/mic.ko ? what does
modinfo mic
return?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mpss 3.3 is out of support for a very long time; my guess is that you got lucky that the mic driver was working with the recent kernel updates. What happens when you do a
modprobe mic
? what do you seen with 'dmesg' after that?
Other than that, upgrading the mpss stack to 3.7.2 is fairly painless - but it depends on your setup, of course. If you're installing k1om rpms on the mic then remember to upgrade them as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Executing the following command:
modprobe mic
results in the following output:
$ modprobe mic modprobe: FATAL: Module mic not found.
The dmesg command prints a lot of output, none of which seems to be relevant to the Phi
with a simple grep command:
$ dmesg | grep Phi $ dmesg | grep coprocessor
Upgrading to a newer version of mpss seems unavoidable.
I apologize if I sound ignorant, but how do I check for my k1om installation?
I do seem to have a relevant directory:
/opt/mpss/3.3/sysroots/x86_64-mpsssdk-linux/usr/libexec/k1om-mpss-linux
Can you give some tips concerning the update of k1om (and perhaps mpss as well)?
Thank you for your reply!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
most likely your newer kernel does not have a link to the mic.ko module; you can try adding a weak-updates link for the mic.ko module using
cd /lib/modules/`uname -r`/weak-updates ln -fs /lib/modules/3.10.0-123.el7.x86_64/extra/mic.ko depmod -ae modprobe mic
if that does not work then you'll have to upgrade to mpss 3.7.2 ; if you did not install any MIC-side RPMs (which is specified in the mic configuration files in /etc/mpss/....) then you do not need to worry about the k1om rpms.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear JJK,
I am very thankful for your help. Establishing the weak link, as you proposed, did not help much. Thus, I moved on to the long-term solution of upgrading my mpss software. I tried to follow this guide by the book: http://registrationcenter-download.intel.com/akdlm/irc_nas/9669/readme.txt
Now, this process did not go exactly as planned. I was completely unable to locate several important files that should have been included in the mpss version that was installed - such as the uninstall script and some module files that needed recompiling. I attributed this to having an even older mpss version than 3.3.5 - seemingly a 3.3.1 version. My solution to this problem was actually downloading the online 3.3.5 mpss distribution, re-compiling those mpss modules and running that uninstall scipt. My solution seemingly worked.
I then continued with the installation of the new 3.7.2 mpss software. I followed every step of the algorithm applicable for Red Hat 7.2. The software installed without any issues, following the guide that I mentioned religiously. I arrived at the step of actually checking the mic card before updating its flash files. The problem, unfortunately, persists:
$ modprobe mic modprobe: FATAL: Module mic not found. $ sudo micctrl --initdefaults $ micctrl -s [Error] mic0: State failed - non existent MIC device $ sudo micctrl -rw [sudo] password for georgec: [Error] mic0: Reset aborted - non existent MIC device [Error] mic0: Wait failed - non existent MIC device
Thus, updating the driver, at least host-side, did not solve my issues. Furthermore, I cannot proceed with the completion of the upgrade on the side of Phi.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
did you install the mpss-modules rpm for the RHEL 7.2 kernel? is there a module present in /lib/modules/3.10.0-327.el7/extra ?
what happens if you try to insert that module directly:
insmod /lib/modules/3.10.0-327.el7/extra/mic.ko ? what does
modinfo mic
return?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I removed the previous installation of MPSS (3.3) via the uninstall.sh script that I had to download from https://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss#lx37rel
I then downloaded, extracted, copied and installed MPSS 3.7 modules:
$ tar xvf mpss-3.7.2-linux.tar $ cd mpss-3.7.2 $ uname -r 3.10.0-327.28.3.el7.x86_64 $ cp modules/mpss-modules-3.10.0-327.el7.x86_64-3.7.2-1.x86_64.rpm . $ cp modules/mpss-modules-dev-3.10.0-327.el7.x86_64-3.7.2-1.x86_64.rpm . $ sudo yum install *.rpm
As for the module that you ask:
$ ls /lib/modules/3.10.0-327.el7.x86_64/extra/ mic.ko
Now for the good news: trying your insertion command fixed everything!
$ sudo insmod /lib/modules/3.10.0-327.el7.x86_64/extra/mic.ko [sudo] password for georgec: $
The command worked and I was able to continue the installation (updating flash and SMC).
Everything is working as it should be now and I am able to ssh into the card.
There is only one small detail that maybe you can shed some light onto:
I need to run the insertion command everytime I boot the host, otherwise the mic card remains undetected.
Any ideas on how to fix this last aspect of the issue in a permanent fashion? I could just include the command on
a boot script, but is there some other way to fix it?
Thank you JJK, you have been immensely helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good, now we're getting somewhere.
The next step is to ensure that the mic.ko module is found at boot time. Try the following
cd /lib/modules/`uname -r`/weak-updates ln -fs /lib/modules/3.10.0-327.el7.x86_64/extra/mic.ko depmod -ae modinfo mic
This will create the right "weak-updates" link for the mic.ko module. Afterwards, the mic.ko module should be loadable using
rmmod mic modprobe mic
and the mpss stack will automatically insert this module at startup.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you JJK, the process you described removed the need to manually call insmod at server booting.
$ modinfo mic filename: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/mic.ko license: GPL build_scmver: 2659671e21e0814014e442998fbdc8ff37d1c68e build_ondate: 2016-08-09 15:39:20 -0400 build_bywhom: qb_user@sid-bld06.pdx.intel.com build_number: 0 license: GPL license: GPL rhelversion: 7.2 srcversion: 35F362554621AC4E6F74424 depends: vermagic: 3.10.0-327.el7.x86_64 SMP mod_unload modversions parm: vnet:Vnet operating mode, one of: poll intr dma (vnetmode) parm: vnet_num_buffers:Number of buffers used by the VNET driver (int) parm: vnet_addr:Vnet driver host ring address (ulong) parm: ulimit:SCIF ulimit check (bool) parm: reg_cache:SCIF registration caching (bool) parm: huge_page:SCIF Huge Page Support (bool) parm: p2p:SCIF peer-to-peer (bool) parm: p2p_proxy:SCIF peer-to-peer proxy DMA support (bool) parm: watchdog:SCIF Watchdog (bool) parm: watchdog_auto_reboot:SCIF Watchdog auto reboot (bool) parm: msi:bool parm: mic_msi_enable:To enable MSIx in the driver. parm: pm_qos_cpu_dma_lat:int parm: mic_pm_qos_cpu_dma_lat:PM QoS CPU DMA latency in usecs. parm: ramoops_count:Maximum frame count for the ramoops driver. (int) parm: crash_dump:bool parm: mic_crash_dump_enabled:MIC Crash Dump enabled. parm: psmi:Enable/disable mic psmi (bool)
Thank you for your help :)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page