Processors
Intel® Processors, Tools, and Utilities
15312 Discussions

A hardware error occurs on the pcie device SMBus Controller on C3000

CharlieChenEC
Beginner
743 Views

Hi,

 

On AS4630-54TE, it uses C3000 family CPU. We observe that there is a chance to see the PCIe fatal error shown below on SMBus Controller pcie device on C3000.

 

Could you please shed some light on the possible cause of this error?

 

[2022-11-25 00:07:56] [26941.245614] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[2022-11-26 00:07:56] [26941.245616] {1}[Hardware Error]: event severity: fatal
[2022-11-26 00:07:56] [26941.245616] {1}[Hardware Error]: Error 0, type: fatal
[2022-11-26 00:07:56] [26941.245617] {1}[Hardware Error]: section_type: PCIe error
[2022-11-26 00:07:56] [26941.245617] {1}[Hardware Error]: port_type: 4, root port
[2022-11-26 00:07:56] [26941.245618] {1}[Hardware Error]: version: 1.16
[2022-11-26 00:07:56] [26941.245619] {1}[Hardware Error]: command: 0x4010, status: 0x0546
[2022-11-26 00:07:56] [26941.245619] {1}[Hardware Error]: device_id: 0000:00:12.0
[2022-11-26 00:07:56] [26941.245620] {1}[Hardware Error]: slot: 0
[2022-11-26 00:07:56] [26941.245621] {1}[Hardware Error]: secondary_bus: 0x00
[2022-11-26 00:07:56] [26941.245621] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x19ac
[2022-11-26 00:07:56] [26941.245622] {1}[Hardware Error]: class_code: 000880
[2022-11-26 00:07:56] [26941.245623] {1}[Hardware Error]: aer_uncor_status: 0x00000000, aer_uncor_mask: 0x00000000
[2022-11-26 00:07:56] [26941.245623] {1}[Hardware Error]: aer_uncor_severity: 0x00000000
[2022-11-26 00:07:56] [26941.245624] {1}[Hardware Error]: TLP Header: 00000000 00000000 00000000 00000000
[2022-11-26 00:07:56] [26941.245625] Kernel panic - not syncing: Fatal hardware error!
[2022-11-26 00:07:56] [26941.245626] CPU: 0 PID: 12884 Comm: bash Tainted: G IOE 5.10.0-8-2-amd64 #1 Debian 5.10.46-4
[2022-11-26 00:07:56] [26941.245626] Hardware name: Accton AS4630-54TE/AS4630-54TE, BIOS v47.01.01.00 03/14/2021
[2022-11-26 00:07:56] [26941.245627] Call Trace:
[2022-11-26 00:07:56] [26941.245627] <NMI>
[2022-11-26 00:07:56] [26941.245628] dump_stack+0x6b/0x83
[2022-11-26 00:07:56] [26941.245628] panic+0x123/0x2f9
[2022-11-26 00:07:56] [26941.245629] __ghes_panic.cold+0x21/0x21
[2022-11-26 00:07:56] [26941.245629] ghes_notify_nmi+0x1b0/0x350
[2022-11-26 00:07:56] [26941.245630] nmi_handle+0x58/0x100
[2022-11-26 00:07:56] [26941.245630] default_do_nmi+0x42/0x130
[2022-11-26 00:07:56] [26941.245631] exc_nmi+0x12f/0x150
[2022-11-26 00:07:56] [26941.245631] end_repeat_nmi+0x16/0x55
[2022-11-26 00:07:56] [26941.245632] RIP: 0010:irq_entries_start+0x38/0x660
[2022-11-26 00:07:56] [26941.245633] Code: 00 90 6a 22 e9 b9 0a 00 00 90 6a 23 e9 b1 0a 00 00 90 6a 24 e9 a9 0a 00 00 90 6a 25 e9 a1 0a 00 00 90 6a 26 e9 99 0a 00 00 90 <6a> 27 e9 91 0a 00 00 90 6a 28 e9 89 0a 00 00 90 6a 29 e9 81 0a 00
[2022-11-26 00:07:56] [26941.245634] RSP: 0000:fffffe0000002fd8 EFLAGS: 00000046
[2022-11-26 00:07:56] [26941.245636] RAX: 0000562c11162600 RBX: 00000000000042c9 RCX: 0000000000000001
[2022-11-26 00:07:56] [26941.245636] RDX: 000000000000f928 RSI: 00007ffe220ca184 RDI: 0000562c1094ccf0
[2022-11-26 00:07:56] [26941.245637] RBP: 0000000000000000 R08: 0000562c1094ccf0 R09: 00007ffe220ca184
[2022-11-26 00:07:56] [26941.245638] R10: 0000000000000000 R11: 00007efcedc1ff40 R12: 000000000000f925
[2022-11-26 00:07:56] [26941.245639] R13: 00007ffe220ca290 R14: 0000562c1099e920 R15: 0000000000000000
[2022-11-26 00:07:56] [26941.245639] ? irq_entries_start+0x38/0x660
[2022-11-26 00:07:56] [26941.245640] ? irq_entries_start+0x38/0x660
[2022-11-26 00:07:56] [26941.245640] </NMI>
[2022-11-26 00:07:56] [26941.245640] <ENTRY_TRAMPOLINE>
[2022-11-26 00:07:56] [26941.245641] RIP: 0033:0x562c0e982ab4
[2022-11-26 00:07:56] [26941.245642] RSP: 002b:00007ffe220ca1e0 EFLAGS: 00000246 </ENTRY_TRAMPOLINE>
[2022-11-26 00:07:56] [26941.245682] Kernel Offset: 0x4a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[2022-11-26 00:07:56] 989D9CB492A0A2Version 2.19.1266. Copyright (C) 2021 American Megatrends, Inc. BIOS Date: 03/14/2021 13:54:20 AS4630_54_KM_T Ver: v47.01.01.00

 Here is the output of lspci with respect to that pcie device.

root:~# lspci -vvv -s 00:12.0
00:12.0 System peripheral: Intel Corporation Atom Processor C3000 Series SMBus Contoller - Host (rev 11)
Subsystem: Intel Corporation Atom Processor C3000 Series SMBus Contoller - Host
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 69
Region 0: Memory at dff9f000 (64-bit, non-prefetchable) [size=1K]
Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag- RBE+ FLReset+
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
Capabilities: [80] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [8c] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Address: 00000000fee00858 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Kernel driver in use: ismt_smbus
Kernel modules: i2c_ismt

 

0 Kudos
0 Replies
Reply