Software Archive
Read-only legacy content
17061 Discussions

Installing MPSS - no MIC Cards found or specified in command line

Oliver_P_
Beginner
1,992 Views

Hello,

I've been trying to install MPSS on my system the past week and I end up with the same error every time. I've installed MPSS 3.4.2 without recompiling the packages, following the ReadMe strictly of course. I'm also in contact with ASUS who has sent me a custom BIOS for being able to enable Above 4G decoding, the error remains however, unfortunately.

System:

  • OS: RHEL 7.0
  • Motherboard: Asus Z97-WS with Above 4G decoding enabled
  • CPU: intel i7-4790K

Some output:

# micctrl -s
[Warning] No Mic cards found or specified on command line

# lspci -s 03:00 -vv
03:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 31S1 (rev 11)
    Subsystem: Intel Corporation Device 2500
    Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 16
    Region 0: Memory at <unassigned> (64-bit, prefetchable) [disabled] [size=8G]
    Region 4: Memory at <unassigned> (64-bit, non-prefetchable) [disabled]
    Capabilities: [44] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [4c] Express (v2) Endpoint, MSI 00
        DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
        DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
            RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 256 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
        LnkCap:    Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited
            ClockPM- Surprise- LLActRep- BwNot-
        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk-
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
        LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [88] MSI: Enable- Count=1/16 Maskable- 64bit+
        Address: 0000000000000000  Data: 0000
    Capabilities: [98] MSI-X: Enable- Count=16 Masked-
        Vector table: BAR=4 offset=00017000
        PBA: BAR=4 offset=00018000
    Capabilities: [100 v1] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
        AERCap:    First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
    Kernel driver in use: mic

# dmesg |less |grep mic

[    3.142417] mic: module verification failed: signature and/or required key missing - tainting kernel
[    3.143673] mic 0000:03:00.0: PCI->APIC IRQ transform: INT A -> IRQ 16
[    3.143679] mic 0000:03:00.0: PCI->APIC IRQ transform: INT A -> IRQ 16
[    3.143680] mic 0: failed to reserve mmio space
[    3.143689] mic: No MIC boards present.  SCIF available in loopback mode

Can someone help me out? Am I doing something wrong?

 

0 Kudos
11 Replies
Frances_R_Intel
Employee
1,991 Views

I'm not sure what is going on. It is acting like the above 4G encoding is not enabled but you have already gone to the effort of getting a custom BIOS from ASUS. Then you enabled the above 4G encoding - or they set it for you by default in this custom BIOS? Have they given you any more advice?

If you run the /usr/bin/micdebug.sh script and then attach the tar file to a private message to me (the 'Send Author A Message' button), I will pass it on to some more hardware savvy people here. If I had to make a guess by myself, I would wonder if it has anything to do with enabling MSI-X. On your output you have:

Capabilities: [98] MSI-X: Enable- Count=16 Masked-

but on my systems, I have:

Capabilities: [98] MSI-X: Enable+ Count=16 Masked-

Note the Enabled+. And while in dmesg you have:

mic 0000:03:00.0: PCI->APIC IRQ transform: INT A -> IRQ 16

I have:

mic 0000:84:00.0: irq 112 for MSI/MSI-X

I thought MSI-X was enabled by default when you put the coprocessor card into a PCIe slot and I believe it needs to be enabled for the card to work, but then I am dealing in things here where I am decidedly ignorant. As I say, your best bet is probably to run micdebug and send me the output to pass on to the experts.

0 Kudos
Oliver_P_
Beginner
1,992 Views

Dear Frances,

Thanks for your answer!

Yes, they sent me a custom BIOS and then I enabled Above 4G decoding (before that it used to disable itself after every save&exit). Unfortunately they didn't give me any advice besides "try it with this one", I answered them however that it is still not working and got the reply that they're going to check if it works for them (still waiting for an answer to that).

I'm afraid I can't run micdebug.sh as it throws an error:

# ./usr/bin/micdebug.sh
Error:  MPSS config not found.  Run "micctrl --initdefaults" first.

The problem is I can't run micctrl --initdefaults:

# micctrl --initdefaults
[Warning] No Mic cards found or specified on command line

As for MSI-X, any advice on how to enable it? I tried to change etc/modprobe.d/mic.conf by adding the parameter mic_msi_enable=true but that doesn't work since it doesn't know this parameter (I unloaded the module before changing the config file):

# modprobe mic
modprobe: ERROR: could not insert 'mic': Unknown symbol in module, or unknown parameter (see dmesg)

although modinfo says the parameter is there:

# modinfo mic -p
vnet:Vnet operating mode, one of: poll intr dma (vnetmode)
vnet_num_buffers:Number of buffers used by the VNET driver (int)
vnet_addr:Vnet driver host ring address (ulong)
ulimit:SCIF ulimit check (bool)
reg_cache:SCIF registration caching (bool)
huge_page:SCIF Huge Page Support (bool)
p2p:SCIF peer-to-peer (bool)
p2p_proxy:SCIF peer-to-peer proxy DMA support (bool)
watchdog:SCIF Watchdog (bool)
watchdog_auto_reboot:SCIF Watchdog auto reboot (bool)
msi: (bool)
mic_msi_enable:To enable MSIx in the driver.
pm_qos_cpu_dma_lat: (int)
mic_pm_qos_cpu_dma_lat:PM QoS CPU DMA latency in usecs.
ramoops_count:Maximum frame count for the ramoops driver. (int)
crash_dump: (bool)
mic_crash_dump_enabled:MIC Crash Dump enabled.
psmi:Enable/disable mic psmi (bool)

As you've said it seems like although the decoding is enabled it still doesn't work properly, I guess all I can do now is wait for Asus since you couldn't spot a mistake here either.

 

 

0 Kudos
Hiromichi_I_
Beginner
1,992 Views

 

Hello,

This article suggested that adding kernel boot parameter solve this issue.

http://www.pugetsystems.com/blog/2013/09/19/More-on-motherboards-even-mATX-for-Xeon-Phi-503/

I am facing same issue, but I have not get custom bios from ASUS yet.

 

0 Kudos
Oliver_P_
Beginner
1,992 Views

Hiromichi I. wrote:

 

Hello,

This article suggested that adding kernel boot parameter solve this issue.

http://www.pugetsystems.com/blog/2013/09/19/More-on-motherboards-even-mATX-for-Xeon-Phi-503/

I am facing same issue, but I have not get custom bios from ASUS yet.

 

Hello,

I've already tried this; no success unfortunately. I'm going to send you my custom BIOS later (if that's possible through this Forum), maybe you'll have more luck than me.

0 Kudos
Frances_R_Intel
Employee
1,992 Views

Since micdebug.sh won't work if the MPSS is not fully installed (e.g. if you can't run micctrl -initdefaults), try to collect the following information, tar it up and upload it in a private message to me. I will pass it on to one of our experts to see what they have to say.

dmesg > host_dmesg.txt
cp /var/log/messages messages.txt
cp /etc/modprobe.d/mic.conf modprobe.d.mic.conf.txt
cat /etc/os-release /etc/redhat-release /etc/system-release > os_release_info.txt
cp /etc/selinux/config selinux_config.txt
uname -a > uname.txt
rpm -qa > rpm_packages.txt
lsmod > modules.txt
# this should give you the BIOS settings
dmidecode > dmidecode_info.txt
lspci -vvvvv -s 03:00 > coprocessor_only_lspci.txt
lspci -Dtv > tree_lspci.txt

 

0 Kudos
Morris_H_
Beginner
1,992 Views

I played around with it a little bit more, this time using Fedora. It turns out that the noapic parameter is not needed, so only pci=realloc is necessary. One thing I have noticed is that by adding the parameter and booting in UEFI mode, the system freezes (seemingly even before the init process starts and I cannot convince the Kernel to produce debug output). I only can boot with CSM enabled and installing Grub as if it was an BIOS system. But this may also be related to my graphics adapter.

I can also reproduce the Problems Oliver reported on CentOS.

Regarding Xeon Phi on Arch Linux/Manjaro I found this, but I have not tried it either: http://research.colfaxinternational.com/post/2014/08/20/Arch.aspx

I would be interested in the Windows equivalent of pci=realloc, if it exists. My Windows 8.1 64 system fails to map the memory, too. But I am already happy with a working Xeon Phi on a Z97 Motherboard using Linux.

0 Kudos
Oliver_P_
Beginner
1,992 Views

I'm happy to announce that I finally am able to ssh into the Phi on Fedora 21 (noapic and pci=realloc as boot parameters) with mpss 3.4.2!

Here's what I did:

  • Rebuilding all packages in the main directory and the mpss-modules-* with the newest kernel wersion (think it was 3.12 or so?) with rpmrebuild -ep <package> (another post suggested removing all the lines with usr/bin and usr/lib, but I couldn't find them in this version of the mpss tools, so I didn't change anything)
  • Then installing them with yum install *.rpm
  • Installing mpss modules on top of this from here: https://github.com/xdsopl/mpss-modules (make sure you checkout the devel-3.4 branch)
  • modprobe mic
  • rebooting
  • micctrl --initdefaults

After that my phi was online and I could start the mpss service. Since I flashed it already before that which caused it to not boot correctly (my guess is the cause was different module and driver versions) I didn't need to flash it again.

Hope that also works for you Morris

0 Kudos
P__Robert
Beginner
1,992 Views

Hello,

I am experiencing the same issue.  My exact hardware setup is as follows:

  • OS: RHEL 7.1
  • Motherboard: Asus Sabretooth z97 Mark1 with Above 4G decoding enabled
  • CPU: intel i7-4790K
  • MPSS 3.5.1

Everything works fantastic using Windows Server 2012 R2 with MPSS 3.5.1, but I cannot use bridged networking.

Implementation details:

  • Redhat 7.1 base installation
  • VNC installation
  • Ran security updates for 6/25/2015
  • Modified grub command line to include "noapic pci=realloc"
  • yum install kernel-headers kernel-devel
  • rpmbuild --rebuild mpss-modules*.src.rpm
  • cp $HOME/rpmbuild/RPMS/x86_64/mpss-modules*`uname -r`*.rpm ../modules
  • cp ./modules/*`uname -r`*.rpm .
  • yum install *.rpm
  • modprobe mic
  • lspci -s 02:00 -v
  • micctrl -s

[root@coastn mpss-3.5.1]# lspci -s 02:00 -v
02:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 31S1 (rev 11)
    Subsystem: Intel Corporation Device 2500
    Flags: bus master, fast devsel, latency 0, IRQ 10
    Memory at <unassigned> (64-bit, prefetchable) [size=8G]
    Memory at da000000 (64-bit, non-prefetchable) [size=128K]
    Capabilities: [44] Power Management version 3
    Capabilities: [4c] Express Endpoint, MSI 00
    Capabilities: [88] MSI: Enable- Count=1/16 Maskable- 64bit+
    Capabilities: [98] MSI-X: Enable- Count=16 Masked-
    Capabilities: [100] Advanced Error Reporting
    Kernel driver in use: mic

[root@coastn mpss-3.5.1]# micctrl -s
micctrl(segv_handler+0x18) [0x407818]
/lib64/libc.so.6(+0x35650) [0x7fd0b61ae650]
/lib64/libmpssconfig.so.0.0.1(_add_miclist_not_present+0xe0) [0x7fd0b6740e60]
/lib64/libmpssconfig.so.0.0.1(mpss_get_miclist+0x4d) [0x7fd0b674118d]
micctrl(create_miclist+0x1dd) [0x424c3d]
micctrl(parse_status_args+0xd6) [0x417206]
micctrl(main+0x6ea) [0x407f3a]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fd0b619aaf5]
micctrl() [0x4076f9]

 

The screenshot and micdebug output are attached.

 

Thanks,

Rob

0 Kudos
JJK
New Contributor III
1,992 Views

it's the following line in the 'lspci' output that worries me:

Memory at <unassigned> (64-bit, prefetchable) [size=8G]

this means the memory of the card itself is not mapped correctly into kernel space, and hence the card is inaccessible. On my Phi host which is running fine I see

 

00:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 2a (rev 07) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        Memory behind bridge: cb900000-cb9fffff
        Prefetchable memory behind bridge: 0000021c00000000-0000021dffffffff

02:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 5100 series (rev 11)
        Subsystem: Intel Corporation Device 2500
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 32
        Region 0: Memory at 21c00000000 (64-bit, prefetchable) [size=8G]
        Region 4: Memory at cb900000 (64-bit, non-prefetchable) [size=128K]

 

 

but the micdebug output for your host shows that the prefetchable memory behind the bridge is:

00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller (rev 06) (prog-if 00 [Normal decode])
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
    I/O behind bridge: 0000f000-00000fff
    Memory behind bridge: da000000-da0fffff
    Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff

which looks very odd to me (the second address is lower than the first)

It would also be interesting to see the output of

cat /proc/iomem

The PCI adapter 02:00 (the Phi) should be listed in that output somewhere.

I'd recommend to download a Fedora 22 live CD/USB and boot from that , then rerun the lspci -vv command. This could very well be a problem with the pci host module in the RHEL kernel.

0 Kudos
Frances_R_Intel
Employee
1,992 Views

I'm not sure what is going on.. The people who were dealing with this earlier tried different Linux distributions and boot parameters until they found one that worked. I am suspicious that the problem is somehow related to the MSI-X being disabled (see the lspci output) what what I know about such things is far exceeded by what I don't know.

I have asked the MPSS developers to look at this and they have agreed to try. However, because this problem seems to be specific to this board and only some Linux distributions, I don't know what will come of that. It would be interesting to hear from others who have systems like this but have had no problems. That might help track down just where the issue is.

0 Kudos
Everett_R_
Beginner
1,992 Views

I successfully started my Intel Xeon Phi 5110P today.

System:

  • OS: Ubuntu 14.04 (I tested Centos 7 and it works as well)
  • Motherboard: ASRock X99 WS-E
  • CPU: Intel Xeon E5 1620-v3

The first thing I noticed in Ubuntu was the output from lspci, 

Region 0: Memory at c000000000 (64-bit, prefetchable) [size=8G]

in contrast to 

Region 0: Memory at <unassigned> (64-bit, prefetchable) .

I added pci=realloc to the boot parameters, and then I was able to continue with the install from the readme with no additional problems.

0 Kudos
Reply