Community
cancel
Showing results for 
Search instead for 
Did you mean: 
AMari14
Beginner
3,008 Views

x710 SR-IOV problems

Hi all,

I have following baseline:

Dell R630 (2x14 core Xeon, 128GB RAM, 800GB SSD)

x710 4-port NIC, in 10Gbit mode

SUSE12SP1

Latest NIC firmware but default PF/VF drivers (came with OS, v1,3,4)

VF driver blacklisted on hypervisor

Setup according to official Intel and Suse documentation, KVM hypervisor

With test setup, single VM with single VF and untagged traffic, I could achieve basically line-rate numbers: with MTU 1500, there were about 770Kpps and BW of 9.4Gbps, achieved both for UDP and TCP traffic, with no packet drops. There is plenty of processing power, setup is nice and tidy and everything works as it should.

Production setup is a bit different: VM is using 3 VFs, one for each PF (4th PF is not being used). All VFs except first one use untagged traffic. First VF is passing two types of traffic: first one untagged (VLAN 119) and second one tagged (VLAN 1108). Tagging is done inside VM. Setup worked fine for some time, confirming test setup numbers. However, after some time following errors started to appear in hypervisor logs:

Mar 11 14:32:52 test_machine1 kernel: [10423.889924] i40e 0000:01:00.1: TX driver issue detected on VF 0

Mar 11 14:32:52 test_machine1 kernel: [10423.889925] i40e 0000:01:00.1: Too many MDD events on VF 0, disabled

And performance numbers became erratic: sometimes it worked perfectly, sometimes it did not. But most importantly, packet drops occured.

So, I've reinstalled everything (hypevisor and VMs), configured exactly as before using automated tools, but upgraded PF and VF drivers to latest ones (v2.0.19/v2.0.16). Errors in logs disappeared, but issue persists. Now I have this in logs:

2017-03-12T11:33:43.356014+01:00 test_machine1 kernel: [ 420.439112] i40e 0000:01:00.1: Unable to add VLAN filter 0 for VF 0, error -22

2017-03-12T11:33:43.376009+01:00 test_machine1 kernel: [ 420.459168] i40e 0000:01:00.0: Unable to add VLAN filter 0 for VF 0, error -22

2017-03-12T11:33:44.352009+01:00 test_machine1 kernel: [ 421.435124] i40e 0000:01:00.2: Unable to add VLAN filter 0 for VF 0, error -22

I've increased VM CPU count number, VF ring sizes, turnet off VF spoofcheck in hypervisor, VM linux software buffers, VM netdev.budget kernel parameter (amount of CPU time assinged for NIC processing) etc. but situation remains the same. Sometimes works perfectly, other time it does not.

Can you please provide some insight? Since rx_dropped counter is increasing in VM, I am suspecting driver/VF issue.

Is there a way to handle this problem, without switching to untagged traffic?

Thank you in advance,

Ante

0 Kudos
22 Replies
idata
Community Manager
394 Views

Hi Ante,

 

 

What is the exact x710 4 ports NIC model? What is the exact driver version?

 

 

Thanks,

 

wb

 

AMari14
Beginner
394 Views

Hi,

here you go; for PF:

01:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)

Subsystem: Dell Ethernet 10G 4P X710 SFP+ rNDC

Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+

Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR-

Latency: 0, Cache Line Size: 32 bytes

Interrupt: pin A routed to IRQ 62

Region 0: Memory at 93000000 (64-bit, prefetchable) [size=16M]

Region 3: Memory at 94818000 (64-bit, prefetchable) [size=32K]

Expansion ROM at 94b00000 [disabled] [size=512K]

Capabilities: [40] Power Management version 3

Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)

Status: D0 NoSoftRst+ PME-Enable- DSel=8 DScale=1 PME-

Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+

Address: 0000000000000000 Data: 0000

Masking: 00000000 Pending: 00000000

Capabilities: [70] MSI-X: Enable+ Count=129 Masked-

Vector table: BAR=3 offset=00000000

PBA: BAR=3 offset=00001000

Capabilities: [a0] Express (v2) Endpoint, MSI 00

DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us

ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+

DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+

RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-

MaxPayload 256 bytes, MaxReadReq 4096 bytes

DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-

LnkCap: Port # 0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L0s <2us, L1 <16us

ClockPM- Surprise- LLActRep- BwNot-

LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+

ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported

DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled

LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-

Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-

Compliance De-emphasis: -6dB

LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+

EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-

Capabilities: [e0] Vital Product Data

Product Name: X710 10GbE Controller

Read-only fields:

[V0] Vendor specific: FFV17.5.12

[PN] Part number: 68M95

[MN] Manufacture ID: 31 30 32 38

[V1] Vendor specific: DSV1028VPDR.VER2.0

[V3] Vendor specific: DTINIC

[V4] Vendor specific: DCM10010395C521010395C532010395C543010395C514020395C525020395C536020395C547020395C518030395C529030395C53A030395C54B030395C51C040395C52D040395C53E040395C54F040395C5

[V5] Vendor specific: NPY4

[V6] Vendor specific: PMT7

[V7] Vendor specific: NMVIntel Corp

[V8] Vendor specific: L1D0

[RV] Reserved: checksum good, 4 byte(s) reserved

Read/write fields:

[Y1] System specific: CCF1\x00

End

Capabilities: [100 v2] Advanced Error Reporting

UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-

CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+

AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+

Capabilities: [140 v1] Device Serial Number bc-54-21-ff-ff-96-6e-24

Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)

ARICap: MFVC- ACS-, Next Function: 1

ARICtl: MFVC- ACS-, Function Group: 0

Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)

IOVCap: Migration-, Interrupt Message Number: 000

IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+

IOVSta: Migration-

Initial VFs: 32, Total VFs: 32, Number of VFs: 1, Function Dependency Link: 00

VF offset: 16, stride: 1, Device ID: 154c

Supported Page Size: 00000553, System Page Size: 00000001

Region 0: Memory at 0000000094600000 (64-bit, prefetchable)

Region 3: Memory at 00000000949a0000 (64-bit, prefetchable)

VF Migration: offset: 00000000, BIR: 0

Capabilities: [1a0 v1] Transaction Processing Hints

Device specific mode supported

No steering table available

Capabilities: [1b0 v1] Access Control Services

ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

Capabilities: [1d0 v1] # 19

Kernel driver in use: i40e

Kernel modules: i40e

and for VF:

01:02.0 Ethernet controller: Intel Corporation XL710/X710 Virtual Function (rev 01)

Subsystem: Dell Device 0000

Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-

Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR-

Latency: 0

Region 0: [virtual] Memory at 94600000 (64-bit, prefetchable) [size=64K]

Region 3: [virtual] Memory at 949a0000 (64-bit, prefetchable) [size=16K]

Capabilities: [70] MSI-X: Enable+ Count=5 Masked-

Vector table: BAR=3 offset=00000000

PBA: BAR=3 offset=00002000

Capabilities: [a0] Express (v2) Endpoint, MSI 00

DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us

ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+

DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-

RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-

MaxPayload 128 bytes, MaxReadReq 128 bytes

DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-

LnkCap: Port # 0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L0s <2us, L1 <16us

ClockPM- Surprise- LLActRep- BwNot-

LnkCtl: ASPM Disabled; RCB 64 bytes Disabled-...

idata
Community Manager
394 Views

Hi Ante,

 

 

Thank you for the information just to double check if this is an onboard NIC on Dell system?

 

 

Thanks,

 

wb

 

AMari14
Beginner
394 Views

Hi,

as can be seen from information provided previously, it is rNDC ie add-on card attached to motherboard:

http://i.dell.com/sites/doccontent/business/large-business/en/Documents/Intel-X710-Quad-Port-10-GbE-... http://i.dell.com/sites/doccontent/business/large-business/en/Documents/Intel-X710-Quad-Port-10-GbE-...

No other NICs are present, no swaps, upgrades, HW modifications etc. This is baseline R630 system.

Br,

Ante

AMari14
Beginner
394 Views

Hi,

do you have any update or need any additional information?

BR,

Ante

idata
Community Manager
394 Views

Hi Ante, we're still checking your issue. On the other hand, you may also report this issue to Dell for further assistance.

 

 

 

regards,

 

Vince
AMari14
Beginner
394 Views

Hi Vince,

Dell is trailing latest driver version by couple of revisions, not sure how much of support is possible to get there. If I downgrade driver, then I have MDD events, and I am back at the beginning, kind of a loop situation.

Let me know if you figure something out.

BR,

Ante

idata
Community Manager
394 Views

Hi Ante,

 

 

Further checking Dell is the best route where you can contact as they sometime adjusts the hardware or firmware of the card which the information is known by them only. Hope this clarifies.

 

 

Thanks,

 

wb

 

AMari14
Beginner
394 Views

Hi,

as mentioned previously, I am not using Dell drivers, but ones sourced from Intel.

Not sure if ping-pong game is going to help, is Intel not a maker of drivers? Who is then supposed to know the most about issue I face?

Can you please provide info what those error messages mean, and if there is a workaround?

Thanks,

Ante

idata
Community Manager
394 Views

Hi Ante,

 

 

Thank you for the reply. This is Dell OEM card thus it is recommend you contact Dell for the customized driver.

 

 

Rgds,

 

wb

 

idata
Community Manager
394 Views

Hi Ante,

 

 

As this is Dell OEM card, the only thing we have available for this device is our generic driver. But we don't guarantee our driver will work.

 

 

 

Dell sometime adjusts the hardware or firmware of the card and that information is not know or tracked by us. Thus it is recommended to contact Dell support directly. Please feel free to update me if other assistance needed.

 

 

Thanks,

 

wb
SRao6
Beginner
394 Views

I am using X710 on a UCS server and i see the sam issue.

Is there a response from engineering on this?

idata
Community Manager
394 Views

Hi Shivrao,

 

 

Thank you for posting in Wired Communities. Just to double check is your X710 an OEM version from Dell? Can you share more information about your NIC.

 

 

Thanks,

 

Sharon

 

SRao6
Beginner
394 Views

No This is not from Dell.

This is a cisco UCS server and the NIC is purchased from Intel.

81:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)

81:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)

81:00.2 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)

81:00.3 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)

# ethtool -i enp129s0f0

driver: i40e

version: 1.6.27-k

firmware-version: 5.04 0x80002542 0.385.7

expansion-rom-version:

bus-info: 0000:81:00.0

I am attaching the logs.

The trigger is to enable SRIOV and put all VFs into trust mode

idata
Community Manager
394 Views

Hi Shivrao,

 

 

Thank you for the information. The log shows the the X710 firmware version is 5.04, you may try upgrade the firmware to latest version 6.01

 

https://downloadcenter.intel.com/download/24769/Non-Volatile-Memory-NVM-Update-Utility-for-Intel-Eth...

 

 

Please feel free to update me.

 

 

Thanks,

 

Sharon

 

 

 

SRao6
Beginner
394 Views

nvmupdatetool says no update available:

# ./nvmupdate64e

Intel(R) Ethernet NVM Update Tool

NVMUpdate version 1.30.2.17

Copyright (C) 2013 - 2017 Intel Corporation.

WARNING: To avoid damage to your device, do not stop the update or reboot or power off the system during this update.

Inventory in progress. Please wait [|.........]

Num Description Ver. DevId S:B Status

=== ======================================== ===== ===== ====== ===============

01) Intel(R) I350 Gigabit Network Connection 1.99 1521 00:001 Update not

available

02) Intel(R) I350 Gigabit Network Connection 1.99 1521 00:004 Update not

available

03) Intel(R) Ethernet Server Adapter X520-2 0.147 10FB 00:007 Update not

available

04) Cisco(R) Ethernet Converged NIC X710-DA4 5.04 1572 00:129 Update not

available

Tool execution completed with the following status: Device not found

Press any key to exit

idata
Community Manager
394 Views

Hi Shivrao,

 

 

Can you help type the following command to help get the vendor and device ID:

 

 

Type "lspci -nn | grep -i 'Ethernet Controller' " at a command prompt as stated in the Linux section of the url at https://www.intel.com/content/www/us/en/support/articles/000005612/network-and-i-o/ethernet-products...

 

 

Thanks,

 

Sharon

 

SRao6
Beginner
394 Views

# lspci -nn | grep -i 'Ethernet Controller'

01:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)

01:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)

04:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)

04:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)

04:00.2 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)

04:00.3 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)

07:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)

07:00.1 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)

81:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 01)

81:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 01)

81:00.2 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 01)

81:00.3 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 01)

idata
Community Manager
394 Views

Hi Shivrao,

 

 

Thank you for the information. I sent a PM to you, please check.

 

 

Regards,

 

Sharon

 

idata
Community Manager
157 Views

Hi Shivrao,

 

 

Please feel free to update me the result.

 

 

Thanks,

 

Sharon

 

Reply