- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have multiple systems (Linux Servers) with these NICs. In each setup they are connected to a switch with a 40G port and we see no errors on the switch stats. We are using a 3m direct attach cable between the NIC and switch.
What we also see is some systems are more prone to port.rx_dropped errors increasing over time and then eventually causing the higher level application to complain about not receiving packets.
I have looked over the performance tuning guide and tried some of the irq suggestions but am not sure the settings are being correctly applied.
Are there any suggestions to settings to try?
Our versions/enviroment are below:
Linux host-xxx 4.4.0-146-generic #172-Ubuntu SMP Wed Apr 3 09:00:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
⇆ lyft@host-xxx:~$ ethtool -i ens1
driver: i40e
version: 1.4.25-k
firmware-version: 6.01 0x800035da 1.1747.0
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
⇆ lyft@host-xxx:~$ modinfo i40e
filename: /lib/modules/4.4.0-146-generic/kernel/drivers/net/ethernet/intel/i40e/i40e.ko
version: 1.4.25-k
license: GPL
description: Intel(R) Ethernet Connection XL710 Network Driver
author: Intel Corporation, <e1000-devel@lists.sourceforge.net>
srcversion: D0A81CD58EA01CB81831F3E
alias: pci:v00008086d00001588sv*sd*bc*sc*i*
alias: pci:v00008086d00001587sv*sd*bc*sc*i*
alias: pci:v00008086d000037D2sv*sd*bc*sc*i*
alias: pci:v00008086d000037D1sv*sd*bc*sc*i*
alias: pci:v00008086d000037D0sv*sd*bc*sc*i*
alias: pci:v00008086d000037CFsv*sd*bc*sc*i*
alias: pci:v00008086d000037CEsv*sd*bc*sc*i*
alias: pci:v00008086d00001587sv*sd*bc*sc*i*
alias: pci:v00008086d00001589sv*sd*bc*sc*i*
alias: pci:v00008086d00001586sv*sd*bc*sc*i*
alias: pci:v00008086d00001585sv*sd*bc*sc*i*
alias: pci:v00008086d00001584sv*sd*bc*sc*i*
alias: pci:v00008086d00001583sv*sd*bc*sc*i*
alias: pci:v00008086d00001581sv*sd*bc*sc*i*
alias: pci:v00008086d00001580sv*sd*bc*sc*i*
alias: pci:v00008086d00001574sv*sd*bc*sc*i*
alias: pci:v00008086d00001572sv*sd*bc*sc*i*
depends: ptp,vxlan
retpoline: Y
intree: Y
vermagic: 4.4.0-146-generic SMP mod_unload modversions retpoline
parm: debug:Debug level (0=none,...,16=all) (int)
⇆ lyft@host-xxx:~$ ethtool -S ens1 | egrep "drop|err|crc|fault"
rx_errors: 0
tx_errors: 0
rx_dropped: 0
tx_dropped: 0
rx_length_errors: 0
rx_crc_errors: 0
tx_lost_interrupt: 0
fcoe_bad_fccrc: 0
rx_fcoe_dropped: 0
fcoe_last_error: 0
port.tx_errors: 0
port.rx_dropped: 0
port.tx_dropped_link_down: 0
port.rx_crc_errors: 0
port.mac_local_faults: 0
port.mac_remote_faults: 0
port.rx_length_errors: 0
After some time of running our SW on it...maybe around 5mins later...
⇆ lyft@host-xxx:~$ ethtool -S ens1 | egrep "drop|err|crc|fault"
rx_errors: 0
tx_errors: 0
rx_dropped: 0
tx_dropped: 0
rx_length_errors: 0
rx_crc_errors: 0
tx_lost_interrupt: 1
fcoe_bad_fccrc: 0
rx_fcoe_dropped: 0
fcoe_last_error: 0
port.tx_errors: 0
port.rx_dropped: 148
port.tx_dropped_link_down: 0
port.rx_crc_errors: 0
port.mac_local_faults: 0
port.mac_remote_faults: 0
port.rx_length_errors: 0
around 10 mins later:
⇆ lyft@host-xxx:~$ ethtool -S ens1 | egrep "drop|err|crc|fault"
rx_errors: 0
tx_errors: 0
rx_dropped: 0
tx_dropped: 0
rx_length_errors: 0
rx_crc_errors: 0
tx_lost_interrupt: 5
fcoe_bad_fccrc: 0
rx_fcoe_dropped: 0
fcoe_last_error: 0
port.tx_errors: 0
port.rx_dropped: 610
port.tx_dropped_link_down: 0
port.rx_crc_errors: 0
port.mac_local_faults: 0
port.mac_remote_faults: 0
port.rx_length_errors: 0
Can send lspci output on a separate message.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello MPate48,
Thank you for posting in Intel Ethernet Communities. Kindly provide the following details for us to check on your query.
1.) Kindly share the lspci output message you mentioned on your post.
2.) Exact model of the the switch.
3.) Model of the cable used.
4.) Please also share the link where you downloaded the driver.
Looking forward to your reply.
Best regards,
Crisselle C
Intel Customer Support
A Contingent Worker at Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
⇆ lyft@host-xxx:~$ sudo lspci -vvvs "01:00.0"
01:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)
Subsystem: Intel Corporation Ethernet Converged Network Adapter XL710-Q2
Physical Slot: 1
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 28
Region 0: Memory at 20000800000 (64-bit, prefetchable) [size=8M]
Region 3: Memory at 20000400000 (64-bit, prefetchable) [size=32K]
Expansion ROM at c7200000 [disabled] [size=512K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable+ Count=129 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00001000
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L0s <2us, L1 <16us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [e0] Vital Product Data
Product Name: XL710 40GbE Controller
Read-only fields:
[PN] Part number:
[EC] Engineering changes:
[FG] Unknown:
[LC] Unknown:
[MN] Manufacture ID:
[PG] Unknown:
[SN] Serial number:
[V0] Vendor specific:
[RV] Reserved: checksum good, 0 byte(s) reserved
Read/write fields:
[V1] Vendor specific:
End
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP+ BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Device Serial Number 60-38-c3-ff-ff-fe-fd-3c
Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
IOVSta: Migration-
Initial VFs: 128, Total VFs: 128, Number of VFs: 0, Function Dependency Link: 00
VF offset: 16, stride: 1, Device ID: 154c
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 0000020001000000 (64-bit, prefetchable)
Region 3: Memory at 0000020000408000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [1a0 v1] Transaction Processing Hints
Device specific mode supported
No steering table available
Capabilities: [1b0 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [1d0 v1] #19
Kernel driver in use: i40e
Kernel modules: i40e
2.
Arista 7050-TX-48 (https://www.arista.com/assets/data/pdf/Datasheets/7050TX-128_48_Datasheet_S.pdf)
3.
CAB-QSFP-P3M (https://www.10gtek.com/QSFP+-to-QSFP+-DAC-152.html)
4.
Driver is already installed on 16.04 LTS Ubuntu kernel. Or we downloaded it from the Intel Website.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello MPate48,
Thank you for the prompt reply.
Upon checking, port.rx_dropped means something is not fast enough in the slot/memory/system. You may check this information on page 10 of the link below.
Before we proceed checking on your query, we'd like to double check the following details:
1.) Have you tried to use another similar spec NIC card (from another vendor, if available) and keep the system configuration the same to isolate the issue?
2.) Have you tried to install it in a different slot?
3.) Have you tried to update the driver to its latest version to check if it would be of help to the issue?
Looking forward to your response.
Best regards,
Crisselle C
Intel Customer Support
A Contingent Worker at Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello MPate48,
Good day!
We'd like to follow up the requested details for us to further check on your query. If you have additional questions and clarifications, please do not hesitate to ask.
Looking forward to hear from you.
Best regards,
Crisselle C
Intel Customer Support
A Contingent Worker at Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello MPate48,
Good day!
Please be informed that we will now proceed closing this request since we haven't received any response from our previous follow up. Should you have any other concern of assistance needed in the future, feel free to post a new question.
Best regards,
Crisselle C
Intel Customer Support
A Contingent Worker at Intel
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page