We have two NICs in our server: 82599 and X520, each NIC has two port(or interface? I'm not sure which word is suitable).
Two ports of 82599 NIC enable SR-IOV and insert vf into VM.
Only one port of X520 NIC only enable SR-IOV(e.g: eth0) and insert vf into VM, the other port(e.g: eth3) disable SR-IOV, only host use eth3.
We found some high latency between host and host occasionally (the connection between host is: host <--> TOR <--> host, TOR had tested work well), so we made following test:
- keep SR-IOV disable on eth3
we used one server ping hosts to test latency for 12 hours:
We also find CPU's Soft Interrupt went higher when latency is high.
- keep SR-IOV enable on eth3
we used the same environment to test latency for 2days, we only enable SR-IOV on eth3, but didn't insert vf into vm:
As you can tell, after enable SR-IOV in eth3, high latency's problem never happened.
I want to ask these questions:
- Why enable SR-IOV on eth3 can solve this problem?
- We found there two port in one NIC, but we just enable one port's SR-IOV, disable the other port's SR-IOV. Is it possible that one nic chip would not allow this operation? Is that a known issue, I can't find any answer fron community or driver's Release.
- Is there any other reason may cause this problem? like rx queue number or other things.
- Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
- driver: ixgbe
- version: 5.3.5
- firmware-version: 0x800008d3, 18.0.17
- driver: ixgbevf
- version: 4.3.3
- Kernel: 3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 (2018-01-08) x86_64 GNU/Linux
ethtool -k eth3 Features for eth3: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: on [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: on udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off receive-hashing: on highdma: on [fixed] rx-vlan-filter: on vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: on [fixed] tx-gre-segmentation: off [fixed] tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-mpls-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off [fixed] tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off [fixed] busy-poll: on [fixed] ethtool -l eth3 Channel parameters for eth3: Pre-set maximums: RX: 0 TX: 0 Other: 1 Combined: 2 Current hardware settings: RX: 0 TX: 0 Other: 1 Combined: 2
Wish for replying!
Hi Michael L.,
Thanks for your reply.
We check the host hardware:
1. What is your host OS?
Debian 8.10, Linux 3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 (2018-01-08) x86_64 GNU/Linux
2. Are you using on-board NIC cards?
3. What is the model of your system? Or if your NIC is on-board, what is the model of your board?
One NIC is on-board(eth1 and eth2), One NIC is external(eth0 and eth3).
On-board: 1028 1f72 Ethernet 10G 4P X520/I350 Dell rNDC
External: 8086 000c Ethernet Server Adapter X520-2
We enable eth3's SR-IOV solve this issue.
4. Have you tried different driver versions?
Yes, we had changed ixgbe driver to newest 5.5.5, ixgbevf driver to newest 4.5.2, but the problem was still.
What's more, We also check the following setting:
- power saving is performance
- c-state closed in OS and BIOS
- rx-usecs of interface is set to 1
Wish it can help.