Items with no label
3335 Discussions

PCIe Errors for Bad TLP

HSing52
Beginner
4,529 Views

Dear Sir/Madam,

 

I have noted below errors in logs at multiple instances in my server with solarflare card (SFN 8522 Plus , Flareon Ultra 8000 Series 10G) running fedora 29.:

 

 

[Mon Mar 23 04:45:09 2020] pcieport 0000:00:03.2: AER: Multiple Corrected error received: 0000:00:03.2

[Mon Mar 23 04:45:09 2020] pcieport 0000:00:03.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)

[Mon Mar 23 04:45:09 2020] pcieport 0000:00:03.2: device [8086:2f0a] error status/mask=00000040/00002000

[Mon Mar 23 04:45:09 2020] pcieport 0000:00:03.2: [ 6] Bad TLP

[Mon Mar 23 04:45:09 2020] pcieport 0000:00:03.2: Error of this Agent is reported first

[Mon Mar 23 04:45:09 2020] sfc 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)

[Mon Mar 23 04:45:09 2020] sfc 0000:01:00.0: device [1924:0a03] error status/mask=00001000/00006000

[Mon Mar 23 04:45:09 2020] sfc 0000:01:00.0: [12] Replay Timer Timeout

[Mon Mar 23 04:45:09 2020] sfc 0000:01:00.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)

[Mon Mar 23 04:45:09 2020] sfc 0000:01:00.1: device [1924:0a03] error status/mask=00001000/00006000

[Mon Mar 23 04:45:09 2020] sfc 0000:01:00.1: [12] Replay Timer Timeout

 

------

 

[Wed Apr 15 18:41:30 2020] pcieport 0000:00:03.2: AER: Multiple Corrected error received: 0000:00:03.2

[Wed Apr 15 18:41:30 2020] pcieport 0000:00:03.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)

[Wed Apr 15 18:41:30 2020] pcieport 0000:00:03.2: device [8086:2f0a] error status/mask=00000040/00002000

[Wed Apr 15 18:41:30 2020] pcieport 0000:00:03.2: [ 6] Bad TLP

[Wed Apr 15 18:41:30 2020] pcieport 0000:00:03.2: Error of this Agent is reported first

[Wed Apr 15 18:41:30 2020] sfc 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)

[Wed Apr 15 18:41:30 2020] sfc 0000:01:00.0: device [1924:0a03] error status/mask=00001000/00006000

[Wed Apr 15 18:41:30 2020] sfc 0000:01:00.0: [12] Replay Timer Timeout

[Wed Apr 15 18:41:30 2020] sfc 0000:01:00.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)

[Wed Apr 15 18:41:30 2020] sfc 0000:01:00.1: device [1924:0a03] error status/mask=00001000/00006000

[Wed Apr 15 18:41:30 2020] sfc 0000:01:00.1: [12] Replay Timer Timeout

[Thu Apr 16 16:40:26 2020] pcieport 0000:00:03.2: AER: Corrected error received: 0000:00:03.2

[Thu Apr 16 16:40:26 2020] pcieport 0000:00:03.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)

[Thu Apr 16 16:40:26 2020] pcieport 0000:00:03.2: device [8086:2f0a] error status/mask=00001000/00002000

[Thu Apr 16 16:40:26 2020] pcieport 0000:00:03.2: [12] Replay Timer Timeout

 

 

1. device [8086:2f0a] &  device [1924:0a03] both occurred at exactly same time twice

a) Wed Apr 15 18:41:30 2020

b) Mon Mar 23 04:45:09 2020'

 

device [8086:2f0a] --> 00:03.2 PCI bridge [0604]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 [8086:2f0a] (rev 02)

device [1924:0a03] -–> 01:00.0 Ethernet controller [0200]: Solarflare Communications SFC9220 10/40G Ethernet Controller [1924:0a03] (rev 02)

 

However, many times we see only :

 

[Wed Apr 15 18:41:30 2020] pcieport 0000:00:03.2: AER: Multiple Corrected error received: 0000:00:03.2

[Wed Apr 15 18:41:30 2020] pcieport 0000:00:03.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)

[Wed Apr 15 18:41:30 2020] pcieport 0000:00:03.2: device [8086:2f0a] error status/mask=00000040/00002000

[Wed Apr 15 18:41:30 2020] pcieport 0000:00:03.2: [ 6] Bad TLP

 

We are seeing above errors for Bad TLP, Bad DLLP as well as Replay Timer Timeout.

 

Could you please assist ? Let me know in case you need more info.

0 Kudos
1 Reply
chengke
Beginner
2,282 Views

does the question resolve ?

0 Kudos
Reply