Software Archive
Read-only legacy content
17061 Discussions

31S1P problems (MSI-X Enable-, or 4G Decoding, probably)

Jared_H_1
Beginner
719 Views

Hello, everyone.  I've been lurking on the forums for a few days now while I schemed up a cooling solution for my shiny new 31S1P. 

I'm pretty sure I've conquered the cooling requirements.  Check!

However, I cannot get the card to work correctly.  I'm using a Z97-WS motherboard with "4G Decoding" enabled in the BIOS settings. The CPU is a Celeron G1820 which is a cheap little lga1150 socket CPU that seemed to be enough for this rig.  I'm running the latest BIOS (2403, I believe from 2015-06-18 or thereabouts), latest version of CentOS 7.1, which is 7.1.1503 (Core). 

I've followed all of the advice and forums I could find online about this issue, to little avail.  Here is a piece of my console log showing the relevant information I am likely to be asked to provide if I don't do it here:

----------------------------------------------------------------------------------------

[root@x mpss-3.5.2]# dmesg | grep MSI
[    0.102438] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
[    0.408378] pcieport 0000:00:01.0: irq 40 for MSI/MSI-X
[    0.408786] pcieport 0000:01:00.0: irq 41 for MSI/MSI-X
[    0.408881] pcieport 0000:02:08.0: irq 42 for MSI/MSI-X
[    0.408972] pcieport 0000:02:10.0: irq 43 for MSI/MSI-X
[    0.409070] pcieport 0000:06:00.0: irq 44 for MSI/MSI-X
[    0.409184] pcieport 0000:07:01.0: irq 45 for MSI/MSI-X
[    0.409349] pcieport 0000:07:02.0: irq 46 for MSI/MSI-X
[    0.409465] pcieport 0000:07:03.0: irq 47 for MSI/MSI-X
[    0.409579] pcieport 0000:07:04.0: irq 48 for MSI/MSI-X
[    0.409692] pcieport 0000:07:05.0: irq 49 for MSI/MSI-X
[    0.409808] pcieport 0000:07:06.0: irq 50 for MSI/MSI-X
[    0.409920] pcieport 0000:07:07.0: irq 51 for MSI/MSI-X
[    0.452551] xhci_hcd 0000:00:14.0: irq 52 for MSI/MSI-X
[    0.518593] xhci_hcd 0000:10:00.0: irq 53 for MSI/MSI-X
[    0.518597] xhci_hcd 0000:10:00.0: irq 54 for MSI/MSI-X
[    0.518600] xhci_hcd 0000:10:00.0: irq 55 for MSI/MSI-X
[    0.710232] e1000e 0000:00:19.0: irq 56 for MSI/MSI-X
[    0.825566] igb 0000:0d:00.0: irq 57 for MSI/MSI-X
[    0.825570] igb 0000:0d:00.0: irq 58 for MSI/MSI-X
[    0.825573] igb 0000:0d:00.0: irq 59 for MSI/MSI-X
[    0.825577] igb 0000:0d:00.0: irq 60 for MSI/MSI-X
[    0.825581] igb 0000:0d:00.0: irq 61 for MSI/MSI-X
[    0.855040] igb 0000:0d:00.0: Using MSI-X interrupts. 2 rx queue(s), 2 tx queue(s)
[    0.984604] i915 0000:00:02.0: irq 62 for MSI/MSI-X
[    1.187177] ahci 0000:00:1f.2: irq 63 for MSI/MSI-X
[    1.189283] ahci 0000:0a:00.0: irq 64 for MSI/MSI-X
[    1.190251] ahci 0000:0f:00.0: irq 65 for MSI/MSI-X
[   12.487762] mei_me 0000:00:16.0: irq 66 for MSI/MSI-X
[   12.702815] snd_hda_intel 0000:00:03.0: irq 67 for MSI/MSI-X
[   12.702983] snd_hda_intel 0000:00:1b.0: irq 68 for MSI/MSI-X
[root@x mpss-3.5.2]# lspci | grep -i coproc
03:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 31S1 (rev 11)
[root@x mpss-3.5.2]# lspci -s 03:00.0 -vv
03:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 31S1 (rev 11)
    Subsystem: Intel Corporation Device 2500
    Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Interrupt: pin A routed to IRQ 255
    Region 0: Memory at <unassigned> (64-bit, prefetchable) [disabled] [size=8G]
    Region 4: Memory at bf200000 (64-bit, non-prefetchable) [disabled] [size=128K]
    Capabilities: [44] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [4c] Express (v2) Endpoint, MSI 00
        DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
        DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
            RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 256 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
        LnkCap:    Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited
            ClockPM- Surprise- LLActRep- BwNot-
        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk-
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
        LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [88] MSI: Enable- Count=1/16 Maskable- 64bit+
        Address: 0000000000000000  Data: 0000
    Capabilities: [98] MSI-X: Enable- Count=16 Masked-
        Vector table: BAR=4 offset=00017000
        PBA: BAR=4 offset=00018000
    Capabilities: [100 v1] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
        AERCap:    First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-

[root@x mpss-3.5.2]# dmesg | grep mic
[    0.000000] CPU0 microcode updated early to revision 0x1c, date = 2014-07-03
[    0.061159] CPU1 microcode updated early to revision 0x1c, date = 2014-07-03
[    0.068898] atomic64 test passed for x86-64 platform with CX8 and with SSE
[    0.089803] ACPI: Dynamic OEM Table Load:
[    0.091965] ACPI: Dynamic OEM Table Load:
[    0.093790] ACPI: Dynamic OEM Table Load:
[    0.387895] microcode: CPU0 sig=0x306c3, pf=0x2, revision=0x1c
[    0.387899] microcode: CPU1 sig=0x306c3, pf=0x2, revision=0x1c
[    0.387920] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
[    0.526732] mousedev: PS/2 mouse device common for all mice
[    0.710216] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    3.071226] usb 5-2: ep 0x81 - rounding interval to 1024 microframes, ep desc says 2040 microframes
[    3.439111] usb 5-2.1: ep 0x81 - rounding interval to 64 microframes, ep desc says 80 microframes
[    3.439113] usb 5-2.1: ep 0x82 - rounding interval to 1024 microframes, ep desc says 2040 microframes
[root@x mpss-3.5.2]# micinfo
MicInfo Utility Log
Created Mon Aug 17 04:01:04 2015


    System Info
        HOST OS            : Linux
        OS Version        : 3.10.0-229.el7.x86_64
        Driver Version        : NotAvailable
        MPSS Version        : 3.5.2

        Host Physical Memory    : 16141 MB
micinfo: No devices found : host driver is not loaded: No such file or directory

[root@x j]# depmod
[root@x j]# modprobe mic
modprobe: FATAL: Module mic not found.
[root@x j]# service mpss start
Starting mpss (via systemctl):                             [  OK  ]
[root@x j]# micctrl -s
  [Error] micrasrelmond: State failed - non existent MIC device
[root@x j]#

----------------------------------------------------------------------------------------

 

If I can get it to show up in a dmesg | grep mic output again, I'll post it here.  i've gotten that output of that to vary a little. 

I don't have a special BIOS from ASUS, but as I said above, it is the latest available and it only came out a few weeks ago.  Could this be an instance of what Frances was talking about here? https://software.intel.com/en-us/forums/topic/538897#comment-1811230

 

In other words, the fact that MSI-X doesn't appear to be operative for my 31S1P.  I cannot for the life of me figure out how to force to to be enabled.  Is this going to require recompiling my kernel?

 

If anyone has any ideas, I'm all ears/eyes.

Thanks!

0 Kudos
7 Replies
Jared_H_1
Beginner
719 Views

Here's my latest dmesg output after I made sure to boot with the correct option in grub:

 

[   15.812291] mic: module verification failed: signature and/or required key missing - tainting kernel
[   15.814365] mic 0000:03:00.0: enabling device (0000 -> 0002)
[   15.814465] mic 0: failed to reserve aperture space
[   15.814480] mic: No MIC boards present.  SCIF available in loopback mode
[j@x ~]$

 

0 Kudos
Jared_H_1
Beginner
719 Views

...and the latest lspci -s 03:00.0 -vv

 

[j@x ~]$ lspci -s 03:00.0 -vv
03:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 31S1 (rev 11)
    Subsystem: Intel Corporation Device 2500
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 11
    Region 0: Memory at <unassigned> (64-bit, prefetchable) [size=8G]
    Region 4: Memory at bf200000 (64-bit, non-prefetchable) [size=128K]
    Capabilities: <access denied>
    Kernel driver in use: mic

[j@x ~]$

 

0 Kudos
Frances_R_Intel
Employee
719 Views

I suspect that you may have a number of problems. As the people using the Z97-WS board in https://software.intel.com/en-us/forums/topic/538897 found, they needed a custom BIOS from Asus, and even then, they had some issues to deal with. In addition, you are using an Intel Celeron G1820 processor. I haven't checked the specs of that processor against the requirements of the coprocessor but I would expect you to have problems with that. The coprocessor and the MPSS releases are only checked against Intel Xeon processors. I know some people have succeeded with other processors but not that processor.

0 Kudos
Jared_H_1
Beginner
719 Views

Frances,

Thank you for your prompt response!

I thought that the G1820 might be a problem, so I just installed an i7-4790K.  Other people have reported success with that one, so it seemed prudent to eliminate that variable.  

I also just booted into a Fedora 22 Live image and it SEEMED to be mapping the 8G of RAM on the 31S1P according to lspci -s 03:00.0 -vv ... but it still says MSI-X Enabled-.  =(

I'm now using the same motherboard, CPU, OS, and Xeon Phi model that others have had success with.  What do you think is the next logical step?

Thanks,

-Jared

0 Kudos
Jared_H_1
Beginner
719 Views

I'm now on Fedora 22 with an i7-4790K as the host CPU and a Z97-WS with 4G Decoding enabled.

Here's what dmesg is showing now:

 

[root@x j]# lspci -s 03:00.0 -vv
03:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 31S1 (rev 11)
    Subsystem: Intel Corporation Device 2500
    Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Interrupt: pin A routed to IRQ 16
    Region 0: Memory at 800000000 (64-bit, prefetchable) [size=8G]
    Region 4: Memory at <unassigned> (64-bit, non-prefetchable)
    Capabilities: [44] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [4c] Express (v2) Endpoint, MSI 00
        DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
        DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
            RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 256 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
        LnkCap:    Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk-
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
        LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [88] MSI: Enable- Count=1/16 Maskable- 64bit+
        Address: 0000000000000000  Data: 0000
    Capabilities: [98] MSI-X: Enable- Count=16 Masked-
        Vector table: BAR=4 offset=00017000
        PBA: BAR=4 offset=00018000
    Capabilities: [100 v1] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
        AERCap:    First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
    Kernel modules: mic_host

 

0 Kudos
Jared_H_
Beginner
719 Views

Frances Roth (Intel) wrote:

I suspect that you may have a number of problems. As the people using the Z97-WS board in https://software.intel.com/en-us/forums/topic/538897 found, they needed a custom BIOS from Asus, and even then, they had some issues to deal with. In addition, you are using an Intel Celeron G1820 processor. I haven't checked the specs of that processor against the requirements of the coprocessor but I would expect you to have problems with that. The coprocessor and the MPSS releases are only checked against Intel Xeon processors. I know some people have succeeded with other processors but not that processor.

Frances,

Hopefully this thread isn't dead already...

I decided to try Windows 8.1 as a host OS for my 31S1P.  It worked absolutely perfectly on the first try.  The Windows app that shows the temperature of the Phi worked fine(and the temperature was good -- 45-65°C).  I was able to get the SSH keys transferred, log in to the card, install a toolchain and compile some of my C code on the card.  It ran fine, and I did it both with the G1820 and the i7-4790K (which required a CPU swap, of course.  I only have one of these Z97-WS motherboards so far.)  I also did NOT require a BIOS image from ASUS.  It worked just fine with their latest BIOS available from asus.com.

Therefore, my problems with the 31S1P have nothing to do with my CPU, my motherboard, or my motherboard's BIOS.  This is purely a Linux / Linux distribution issue.  What I fail to understand is why CentOS 7.1 is supposedly just RHEL 7.1 without the corporate branding, trademarks, logos, etc and yet it does not work with this setup.  I went and looked up pricing for RHEL and it looks like I might be able to get an individual developer copy for $100.  I'd pay that if RHEL really is what I'm going to have to use for this, but I just don't understand why it's not working for me.  Other people have reported success with identical hardware and operating system (Z97-WS + i7-4790K + CentOS 7.1), and mine works in Windows.

This is the current problematic dmesg output:

[  794.525797] mic: module verification failed: signature and/or  required key missing - tainting kernel
[  794.534037] mic 0: failed to reserve mmio space
[  794.534066] mic: No MIC boards present.  SCIF available in loopback mode

 

 

Now, under Fedora 22 I was able to get MSI-X Enabled+ to show up in the lspci output for the card, which was pretty exciting.  It also showed the correct memory mappings related to >4G Decoding in the lspci -s 03:00.0 -v output.  However, the problem with Fedora 22 is that I could not get micctrl or mpss in general to work.  The kernel version that came with Fedora 22 was too new to get anything to compile correctly.  So I reverted to Fedora 21 and installed a 3.19(?) kernel, which made the OS fail to recognize the card properly again.  Specifically, Linux failed to enable MSI/MSI-X for my 31S1P.

 

You're clearly one of the best people to get help from concerning these types of issues, and I've learned quite a bit from observing your forum threads with others.  Any help you can offer for my specific case will be greatly appreciated.  =)

Thanks,

-Jared

 

0 Kudos
Frances_R_Intel
Employee
719 Views

After digging around some more, this is what I know - or think I know:

The issue is not an either/or situation. In most cases, it is the combination of a particular motherboard and a particular Linux distribution that causes the problem. Some Linux distributions take a conservative approach and do not enable MSI-X on a system if they believe some part of the MSI-X implementation on the motherboard might be wrong or incomplete. This does not mean there is actually anything wrong with the motherboard, only that the Linux distribution is not convinced that there is not a problem. In this case, kernel boot parameters are usually used to configure the system so MSI-X stays disabled - to change the behavior it is only necessary to change the boot parameters.If there is actually a problem detected, that particular interface can be blacklisted at boot time, but in some Linux distributions, if one interface using MSI-X is blacklisted, MSI-X is disabled across the entire system. That behavior is sometimes hard coded into the kernel.

Since you know the card is successfully detected with Fedora 22 and not with Fedora 21, could you check to see what, if any, differences existed between the default boot parameters for those releases when you installed them?  

0 Kudos
Reply