Community
cancel
Showing results for 
Search instead for 
Did you mean: 
MPate48
Beginner
1,453 Views

XL710 40GE has increasing port.rx_dropped packets with time. How to tune the settings / OS to minimize or eliminate these drops?

We have multiple systems (Linux Servers) with these NICs. In each setup they are connected to a switch with a 40G port and we see no errors on the switch stats. We are using a 3m direct attach cable between the NIC and switch.

 

What we also see is some systems are more prone to port.rx_dropped errors increasing over time and then eventually causing the higher level application to complain about not receiving packets.

 

I have looked over the performance tuning guide and tried some of the irq suggestions but am not sure the settings are being correctly applied.

 

Are there any suggestions to settings to try?

 

Our versions/enviroment are below:

 

Linux host-xxx 4.4.0-146-generic #172-Ubuntu SMP Wed Apr 3 09:00:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

 

⇆ lyft@host-xxx:~$ ethtool -i ens1

driver: i40e

version: 1.4.25-k

firmware-version: 6.01 0x800035da 1.1747.0

expansion-rom-version: 

bus-info: 0000:01:00.0

supports-statistics: yes

supports-test: yes

supports-eeprom-access: yes

supports-register-dump: yes

supports-priv-flags: yes

 

⇆ lyft@host-xxx:~$ modinfo i40e

filename:    /lib/modules/4.4.0-146-generic/kernel/drivers/net/ethernet/intel/i40e/i40e.ko

version:    1.4.25-k

license:    GPL

description:  Intel(R) Ethernet Connection XL710 Network Driver

author:     Intel Corporation, <e1000-devel@lists.sourceforge.net>

srcversion:   D0A81CD58EA01CB81831F3E

alias:     pci:v00008086d00001588sv*sd*bc*sc*i*

alias:     pci:v00008086d00001587sv*sd*bc*sc*i*

alias:     pci:v00008086d000037D2sv*sd*bc*sc*i*

alias:     pci:v00008086d000037D1sv*sd*bc*sc*i*

alias:     pci:v00008086d000037D0sv*sd*bc*sc*i*

alias:     pci:v00008086d000037CFsv*sd*bc*sc*i*

alias:     pci:v00008086d000037CEsv*sd*bc*sc*i*

alias:     pci:v00008086d00001587sv*sd*bc*sc*i*

alias:     pci:v00008086d00001589sv*sd*bc*sc*i*

alias:     pci:v00008086d00001586sv*sd*bc*sc*i*

alias:     pci:v00008086d00001585sv*sd*bc*sc*i*

alias:     pci:v00008086d00001584sv*sd*bc*sc*i*

alias:     pci:v00008086d00001583sv*sd*bc*sc*i*

alias:     pci:v00008086d00001581sv*sd*bc*sc*i*

alias:     pci:v00008086d00001580sv*sd*bc*sc*i*

alias:     pci:v00008086d00001574sv*sd*bc*sc*i*

alias:     pci:v00008086d00001572sv*sd*bc*sc*i*

depends:    ptp,vxlan

retpoline:   Y

intree:     Y

vermagic:    4.4.0-146-generic SMP mod_unload modversions retpoline 

parm:      debug:Debug level (0=none,...,16=all) (int)

 

⇆ lyft@host-xxx:~$ ethtool -S ens1 | egrep "drop|err|crc|fault"

   rx_errors: 0

   tx_errors: 0

   rx_dropped: 0

   tx_dropped: 0

   rx_length_errors: 0

   rx_crc_errors: 0

   tx_lost_interrupt: 0

   fcoe_bad_fccrc: 0

   rx_fcoe_dropped: 0

   fcoe_last_error: 0

   port.tx_errors: 0

   port.rx_dropped: 0

   port.tx_dropped_link_down: 0

   port.rx_crc_errors: 0

   port.mac_local_faults: 0

   port.mac_remote_faults: 0

   port.rx_length_errors: 0

 

After some time of running our SW on it...maybe around 5mins later...

 

⇆ lyft@host-xxx:~$ ethtool -S ens1 | egrep "drop|err|crc|fault"

   rx_errors: 0

   tx_errors: 0

   rx_dropped: 0

   tx_dropped: 0

   rx_length_errors: 0

   rx_crc_errors: 0

   tx_lost_interrupt: 1

   fcoe_bad_fccrc: 0

   rx_fcoe_dropped: 0

   fcoe_last_error: 0

   port.tx_errors: 0

   port.rx_dropped: 148

   port.tx_dropped_link_down: 0

   port.rx_crc_errors: 0

   port.mac_local_faults: 0

   port.mac_remote_faults: 0

   port.rx_length_errors: 0

 

around 10 mins later:

⇆ lyft@host-xxx:~$ ethtool -S ens1 | egrep "drop|err|crc|fault"

   rx_errors: 0

   tx_errors: 0

   rx_dropped: 0

   tx_dropped: 0

   rx_length_errors: 0

   rx_crc_errors: 0

   tx_lost_interrupt: 5

   fcoe_bad_fccrc: 0

   rx_fcoe_dropped: 0

   fcoe_last_error: 0

   port.tx_errors: 0

   port.rx_dropped: 610

   port.tx_dropped_link_down: 0

   port.rx_crc_errors: 0

   port.mac_local_faults: 0

   port.mac_remote_faults: 0

   port.rx_length_errors: 0

 

Can send lspci output on a separate message.

 

 

0 Kudos
5 Replies
CrisselleF_C_Intel
Moderator
896 Views

Hello MPate48,

 

Thank you for posting in Intel Ethernet Communities. Kindly provide the following details for us to check on your query. 

1.) Kindly share the lspci output message you mentioned on your post.

2.) Exact model of the the switch.

3.) Model of the cable used.

4.) Please also share the link where you downloaded the driver. 

 

Looking forward to your reply.

 

Best regards,

Crisselle C

Intel Customer Support

A Contingent Worker at Intel

MPate48
Beginner
896 Views

  1.  

⇆ lyft@host-xxx:~$ sudo lspci -vvvs "01:00.0"

01:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)

Subsystem: Intel Corporation Ethernet Converged Network Adapter XL710-Q2

Physical Slot: 1

Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+

Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

Latency: 0, Cache Line Size: 32 bytes

Interrupt: pin A routed to IRQ 28

Region 0: Memory at 20000800000 (64-bit, prefetchable) [size=8M]

Region 3: Memory at 20000400000 (64-bit, prefetchable) [size=32K]

Expansion ROM at c7200000 [disabled] [size=512K]

Capabilities: [40] Power Management version 3

Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)

Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-

Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+

Address: 0000000000000000 Data: 0000

Masking: 00000000 Pending: 00000000

Capabilities: [70] MSI-X: Enable+ Count=129 Masked-

Vector table: BAR=3 offset=00000000

PBA: BAR=3 offset=00001000

Capabilities: [a0] Express (v2) Endpoint, MSI 00

DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us

ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+

DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+

RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-

MaxPayload 256 bytes, MaxReadReq 512 bytes

DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-

LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L0s <2us, L1 <16us

ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+

LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+

ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported

DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled

LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-

Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-

Compliance De-emphasis: -6dB

LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+

EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-

Capabilities: [e0] Vital Product Data

Product Name: XL710 40GbE Controller

Read-only fields:

[PN] Part number: 

[EC] Engineering changes: 

[FG] Unknown: 

[LC] Unknown: 

[MN] Manufacture ID: 

[PG] Unknown: 

[SN] Serial number: 

[V0] Vendor specific: 

[RV] Reserved: checksum good, 0 byte(s) reserved

Read/write fields:

[V1] Vendor specific: 

End

Capabilities: [100 v2] Advanced Error Reporting

UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-

CESta: RxErr- BadTLP+ BadDLLP- Rollover- Timeout- NonFatalErr-

CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-

Capabilities: [140 v1] Device Serial Number 60-38-c3-ff-ff-fe-fd-3c

Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)

ARICap: MFVC- ACS-, Next Function: 0

ARICtl: MFVC- ACS-, Function Group: 0

Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)

IOVCap: Migration-, Interrupt Message Number: 000

IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+

IOVSta: Migration-

Initial VFs: 128, Total VFs: 128, Number of VFs: 0, Function Dependency Link: 00

VF offset: 16, stride: 1, Device ID: 154c

Supported Page Size: 00000553, System Page Size: 00000001

Region 0: Memory at 0000020001000000 (64-bit, prefetchable)

Region 3: Memory at 0000020000408000 (64-bit, prefetchable)

VF Migration: offset: 00000000, BIR: 0

Capabilities: [1a0 v1] Transaction Processing Hints

Device specific mode supported

No steering table available

Capabilities: [1b0 v1] Access Control Services

ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

Capabilities: [1d0 v1] #19

Kernel driver in use: i40e

Kernel modules: i40e

2.

Arista 7050-TX-48 (https://www.arista.com/assets/data/pdf/Datasheets/7050TX-128_48_Datasheet_S.pdf)

 

3.

CAB-QSFP-P3M (https://www.10gtek.com/QSFP+-to-QSFP+-DAC-152.html)

4.

Driver is already installed on 16.04 LTS Ubuntu kernel. Or we downloaded it from the Intel Website.

CrisselleF_C_Intel
Moderator
896 Views

Hello MPate48,

 

Thank you for the prompt reply.

 

Upon checking, port.rx_dropped means something is not fast enough in the slot/memory/system. You may check this information on page 10 of the link below.

https://www.intel.com/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance...

 

Before we proceed checking on your query, we'd like to double check the following details:

1.) Have you tried to use another similar spec NIC card (from another vendor, if available) and keep the system configuration the same to isolate the issue?

2.) Have you tried to install it in a different slot?

3.) Have you tried to update the driver to its latest version to check if it would be of help to the issue?

 

Looking forward to your response.

 

Best regards,

Crisselle C

Intel Customer Support

A Contingent Worker at Intel

CrisselleF_C_Intel
Moderator
896 Views

Hello MPate48,

 

Good day!

 

We'd like to follow up the requested details for us to further check on your query. If you have additional questions and clarifications, please do not hesitate to ask.

 

Looking forward to hear from you.

 

Best regards,

Crisselle C

Intel Customer Support

A Contingent Worker at Intel

CrisselleF_C_Intel
Moderator
896 Views

Hello MPate48,

 

Good day!

 

Please be informed that we will now proceed closing this request since we haven't received any response from our previous follow up. Should you have any other concern of assistance needed in the future, feel free to post a new question.

 

Best regards,

Crisselle C

Intel Customer Support

A Contingent Worker at Intel

Reply