- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
lspci -vv -s 05:00.0
05:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 31S1 (rev 11)
Subsystem: Intel Corporation Device 2500
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 0
Region 0: Memory at <unassigned> (64-bit, prefetchable) [disabled] [size=8G]
Region 4: Memory at dff00000 (64-bit, non-prefetchable) [disabled] [size=128K]
Capabilities: [44] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [4c] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [88] MSI: Enable- Count=1/16 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [98] MSI-X: Enable- Count=16 Masked-
Vector table: BAR=4 offset=00017000
PBA: BAR=4 offset=00018000
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
device after a while (after system boot) would disappear from the OS quietly.
Also without kernel module load at boot time, but with manual load it disappears, with this feedback:
155.158118] mic: module verification failed: signature and/or required key missing - tainting kernel
[ 155.170604] vnet: mode: dma, buffers: 62
[ 155.170651] mic 0000:05:00.0: enabling device (0140 -> 0142)
[ 155.170797] mic 0: failed to reserve aperture space
[ 155.170838] mic: No MIC boards present. SCIF available in loopback mode
any help greatly appreciated.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
and then:
lspci -vv -s 05:00.0
05:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 31S1 (rev ff) (prog-if ff)
!!! Unknown header type 7f
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are two types of problems people have been having with their new 31S1 cards - finding an suitable system board and overheating. I think you may be experiencing both.
The lines:
Region 0: Memory at <unassigned> (64-bit, prefetchable) [disabled] [size=8G]
Region 4: Memory at dff00000 (64-bit, non-prefetchable) [disabled]
generally mean that the BIOS settings are wrong. The first thing to check is that the BIOS is set to allow large (>4G) BAR addresses. If you can't change the BAR size in the BIOS you need a new BIOS (or a different board). It is possible there are other problems as well. You might want to follow the discussion in https://software.intel.com/en-us/forums/topic/538897 in which another user is trying to solve this issue.
The card disappearing after the system is on for a while often means that the card is overheating. The Intel Xeon Phi coprocessor 31S1P is passively cooled. Some smaller systems do not have sufficient cooling for these cards. You might want to follow the discussion in https://software.intel.com/en-us/forums/topic/537661 where the cooling issues are being discussed.
Let us know how it goes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
many thanks Frances
I'm not an engineer but from I gather LBAR is an intrinsic of x86_64, native feature/mechanism. Before I knock on AMD's doors, would you comment on those claims about this being not the case with AMD latest x86 CPU, simply that it does not work there - just a comment, I would not expect a statement.
On thermal subject, even though it all sits in a Supermicro server case I'll try to give Phis more air and will share my findings.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pawel,
As far as the possibility of overheating, there is a program called micsmc that can run on the host processor either as a GUI or command line. It will let you monitor the temperature. I should have mentioned this before. If you think your system has enough cooling, you might want to bring the GUI up and just watch the temperature for a while. There is a man page that will show you all the options.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"LBAR" is a PCI feature and is not Intel specific - it means that 64bit PCI registers are allowed (and available). Some motherboards only support 32bit PCI registers (which is OK from a PCI point of view).
I've got a Supermicro server ( X9DRG-HF motherboard) with 2 Xeon Phi's in it and I had to turn up the fans (using a BIOS setting) to stop the Phi's from overheating. The fans in the Supermicro mobo are now continually blowing at high speeds and the Phi (5110P's) stay at a nice cool 40 degrees C.
HTH,
JJK

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page