- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I've been trying to install MPSS on my system the past week and I end up with the same error every time. I've installed MPSS 3.4.2 without recompiling the packages, following the ReadMe strictly of course. I'm also in contact with ASUS who has sent me a custom BIOS for being able to enable Above 4G decoding, the error remains however, unfortunately.
System:
- OS: RHEL 7.0
- Motherboard: Asus Z97-WS with Above 4G decoding enabled
- CPU: intel i7-4790K
Some output:
# micctrl -s
[Warning] No Mic cards found or specified on command line
# lspci -s 03:00 -vv
03:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 31S1 (rev 11)
Subsystem: Intel Corporation Device 2500
Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at <unassigned> (64-bit, prefetchable) [disabled] [size=8G]
Region 4: Memory at <unassigned> (64-bit, non-prefetchable) [disabled]
Capabilities: [44] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [4c] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [88] MSI: Enable- Count=1/16 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [98] MSI-X: Enable- Count=16 Masked-
Vector table: BAR=4 offset=00017000
PBA: BAR=4 offset=00018000
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Kernel driver in use: mic
# dmesg |less |grep mic
[ 3.142417] mic: module verification failed: signature and/or required key missing - tainting kernel
[ 3.143673] mic 0000:03:00.0: PCI->APIC IRQ transform: INT A -> IRQ 16
[ 3.143679] mic 0000:03:00.0: PCI->APIC IRQ transform: INT A -> IRQ 16
[ 3.143680] mic 0: failed to reserve mmio space
[ 3.143689] mic: No MIC boards present. SCIF available in loopback mode
Can someone help me out? Am I doing something wrong?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not sure what is going on. It is acting like the above 4G encoding is not enabled but you have already gone to the effort of getting a custom BIOS from ASUS. Then you enabled the above 4G encoding - or they set it for you by default in this custom BIOS? Have they given you any more advice?
If you run the /usr/bin/micdebug.sh script and then attach the tar file to a private message to me (the 'Send Author A Message' button), I will pass it on to some more hardware savvy people here. If I had to make a guess by myself, I would wonder if it has anything to do with enabling MSI-X. On your output you have:
Capabilities: [98] MSI-X: Enable- Count=16 Masked-
but on my systems, I have:
Capabilities: [98] MSI-X: Enable+ Count=16 Masked-
Note the Enabled+. And while in dmesg you have:
mic 0000:03:00.0: PCI->APIC IRQ transform: INT A -> IRQ 16
I have:
mic 0000:84:00.0: irq 112 for MSI/MSI-X
I thought MSI-X was enabled by default when you put the coprocessor card into a PCIe slot and I believe it needs to be enabled for the card to work, but then I am dealing in things here where I am decidedly ignorant. As I say, your best bet is probably to run micdebug and send me the output to pass on to the experts.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Frances,
Thanks for your answer!
Yes, they sent me a custom BIOS and then I enabled Above 4G decoding (before that it used to disable itself after every save&exit). Unfortunately they didn't give me any advice besides "try it with this one", I answered them however that it is still not working and got the reply that they're going to check if it works for them (still waiting for an answer to that).
I'm afraid I can't run micdebug.sh as it throws an error:
# ./usr/bin/micdebug.sh
Error: MPSS config not found. Run "micctrl --initdefaults" first.
The problem is I can't run micctrl --initdefaults:
# micctrl --initdefaults
[Warning] No Mic cards found or specified on command line
As for MSI-X, any advice on how to enable it? I tried to change etc/modprobe.d/mic.conf by adding the parameter mic_msi_enable=true but that doesn't work since it doesn't know this parameter (I unloaded the module before changing the config file):
# modprobe mic
modprobe: ERROR: could not insert 'mic': Unknown symbol in module, or unknown parameter (see dmesg)
although modinfo says the parameter is there:
# modinfo mic -p
vnet:Vnet operating mode, one of: poll intr dma (vnetmode)
vnet_num_buffers:Number of buffers used by the VNET driver (int)
vnet_addr:Vnet driver host ring address (ulong)
ulimit:SCIF ulimit check (bool)
reg_cache:SCIF registration caching (bool)
huge_page:SCIF Huge Page Support (bool)
p2p:SCIF peer-to-peer (bool)
p2p_proxy:SCIF peer-to-peer proxy DMA support (bool)
watchdog:SCIF Watchdog (bool)
watchdog_auto_reboot:SCIF Watchdog auto reboot (bool)
msi: (bool)
mic_msi_enable:To enable MSIx in the driver.
pm_qos_cpu_dma_lat: (int)
mic_pm_qos_cpu_dma_lat:PM QoS CPU DMA latency in usecs.
ramoops_count:Maximum frame count for the ramoops driver. (int)
crash_dump: (bool)
mic_crash_dump_enabled:MIC Crash Dump enabled.
psmi:Enable/disable mic psmi (bool)
As you've said it seems like although the decoding is enabled it still doesn't work properly, I guess all I can do now is wait for Asus since you couldn't spot a mistake here either.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
This article suggested that adding kernel boot parameter solve this issue.
http://www.pugetsystems.com/blog/2013/09/19/More-on-motherboards-even-mATX-for-Xeon-Phi-503/
I am facing same issue, but I have not get custom bios from ASUS yet.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hiromichi I. wrote:
Hello,
This article suggested that adding kernel boot parameter solve this issue.
http://www.pugetsystems.com/blog/2013/09/19/More-on-motherboards-even-mATX-for-Xeon-Phi-503/
I am facing same issue, but I have not get custom bios from ASUS yet.
Hello,
I've already tried this; no success unfortunately. I'm going to send you my custom BIOS later (if that's possible through this Forum), maybe you'll have more luck than me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since micdebug.sh won't work if the MPSS is not fully installed (e.g. if you can't run micctrl -initdefaults), try to collect the following information, tar it up and upload it in a private message to me. I will pass it on to one of our experts to see what they have to say.
dmesg > host_dmesg.txt cp /var/log/messages messages.txt cp /etc/modprobe.d/mic.conf modprobe.d.mic.conf.txt cat /etc/os-release /etc/redhat-release /etc/system-release > os_release_info.txt cp /etc/selinux/config selinux_config.txt uname -a > uname.txt rpm -qa > rpm_packages.txt lsmod > modules.txt # this should give you the BIOS settings dmidecode > dmidecode_info.txt lspci -vvvvv -s 03:00 > coprocessor_only_lspci.txt lspci -Dtv > tree_lspci.txt
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I played around with it a little bit more, this time using Fedora. It turns out that the noapic parameter is not needed, so only pci=realloc is necessary. One thing I have noticed is that by adding the parameter and booting in UEFI mode, the system freezes (seemingly even before the init process starts and I cannot convince the Kernel to produce debug output). I only can boot with CSM enabled and installing Grub as if it was an BIOS system. But this may also be related to my graphics adapter.
I can also reproduce the Problems Oliver reported on CentOS.
Regarding Xeon Phi on Arch Linux/Manjaro I found this, but I have not tried it either: http://research.colfaxinternational.com/post/2014/08/20/Arch.aspx
I would be interested in the Windows equivalent of pci=realloc, if it exists. My Windows 8.1 64 system fails to map the memory, too. But I am already happy with a working Xeon Phi on a Z97 Motherboard using Linux.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm happy to announce that I finally am able to ssh into the Phi on Fedora 21 (noapic and pci=realloc as boot parameters) with mpss 3.4.2!
Here's what I did:
- Rebuilding all packages in the main directory and the mpss-modules-* with the newest kernel wersion (think it was 3.12 or so?) with rpmrebuild -ep <package> (another post suggested removing all the lines with usr/bin and usr/lib, but I couldn't find them in this version of the mpss tools, so I didn't change anything)
- Then installing them with yum install *.rpm
- Installing mpss modules on top of this from here: https://github.com/xdsopl/mpss-modules (make sure you checkout the devel-3.4 branch)
- modprobe mic
- rebooting
- micctrl --initdefaults
After that my phi was online and I could start the mpss service. Since I flashed it already before that which caused it to not boot correctly (my guess is the cause was different module and driver versions) I didn't need to flash it again.
Hope that also works for you Morris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am experiencing the same issue. My exact hardware setup is as follows:
- OS: RHEL 7.1
- Motherboard: Asus Sabretooth z97 Mark1 with Above 4G decoding enabled
- CPU: intel i7-4790K
- MPSS 3.5.1
Everything works fantastic using Windows Server 2012 R2 with MPSS 3.5.1, but I cannot use bridged networking.
Implementation details:
- Redhat 7.1 base installation
- VNC installation
- Ran security updates for 6/25/2015
- Modified grub command line to include "noapic pci=realloc"
- yum install kernel-headers kernel-devel
- rpmbuild --rebuild mpss-modules*.src.rpm
- cp $HOME/rpmbuild/RPMS/x86_64/mpss-modules*`uname -r`*.rpm ../modules
- cp ./modules/*`uname -r`*.rpm .
- yum install *.rpm
- modprobe mic
- lspci -s 02:00 -v
- micctrl -s
[root@coastn mpss-3.5.1]# lspci -s 02:00 -v
02:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 31S1 (rev 11)
Subsystem: Intel Corporation Device 2500
Flags: bus master, fast devsel, latency 0, IRQ 10
Memory at <unassigned> (64-bit, prefetchable) [size=8G]
Memory at da000000 (64-bit, non-prefetchable) [size=128K]
Capabilities: [44] Power Management version 3
Capabilities: [4c] Express Endpoint, MSI 00
Capabilities: [88] MSI: Enable- Count=1/16 Maskable- 64bit+
Capabilities: [98] MSI-X: Enable- Count=16 Masked-
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: mic
[root@coastn mpss-3.5.1]# micctrl -s
micctrl(segv_handler+0x18) [0x407818]
/lib64/libc.so.6(+0x35650) [0x7fd0b61ae650]
/lib64/libmpssconfig.so.0.0.1(_add_miclist_not_present+0xe0) [0x7fd0b6740e60]
/lib64/libmpssconfig.so.0.0.1(mpss_get_miclist+0x4d) [0x7fd0b674118d]
micctrl(create_miclist+0x1dd) [0x424c3d]
micctrl(parse_status_args+0xd6) [0x417206]
micctrl(main+0x6ea) [0x407f3a]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fd0b619aaf5]
micctrl() [0x4076f9]
The screenshot and micdebug output are attached.
Thanks,
Rob
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it's the following line in the 'lspci' output that worries me:
Memory at <unassigned> (64-bit, prefetchable) [size=8G]
this means the memory of the card itself is not mapped correctly into kernel space, and hence the card is inaccessible. On my Phi host which is running fine I see
00:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 2a (rev 07) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 Memory behind bridge: cb900000-cb9fffff Prefetchable memory behind bridge: 0000021c00000000-0000021dffffffff 02:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 5100 series (rev 11) Subsystem: Intel Corporation Device 2500 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 32 Region 0: Memory at 21c00000000 (64-bit, prefetchable) [size=8G] Region 4: Memory at cb900000 (64-bit, non-prefetchable) [size=128K]
but the micdebug output for your host shows that the prefetchable memory behind the bridge is:
00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller (rev 06) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 I/O behind bridge: 0000f000-00000fff Memory behind bridge: da000000-da0fffff Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
which looks very odd to me (the second address is lower than the first)
It would also be interesting to see the output of
cat /proc/iomem
The PCI adapter 02:00 (the Phi) should be listed in that output somewhere.
I'd recommend to download a Fedora 22 live CD/USB and boot from that , then rerun the lspci -vv command. This could very well be a problem with the pci host module in the RHEL kernel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not sure what is going on.. The people who were dealing with this earlier tried different Linux distributions and boot parameters until they found one that worked. I am suspicious that the problem is somehow related to the MSI-X being disabled (see the lspci output) what what I know about such things is far exceeded by what I don't know.
I have asked the MPSS developers to look at this and they have agreed to try. However, because this problem seems to be specific to this board and only some Linux distributions, I don't know what will come of that. It would be interesting to hear from others who have systems like this but have had no problems. That might help track down just where the issue is.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I successfully started my Intel Xeon Phi 5110P today.
System:
- OS: Ubuntu 14.04 (I tested Centos 7 and it works as well)
- Motherboard: ASRock X99 WS-E
- CPU: Intel Xeon E5 1620-v3
The first thing I noticed in Ubuntu was the output from lspci,
Region 0: Memory at c000000000 (64-bit, prefetchable) [size=8G]
in contrast to
Region 0: Memory at <unassigned> (64-bit, prefetchable) .
I added pci=realloc to the boot parameters, and then I was able to continue with the install from the readme with no additional problems.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page