Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
1,449 Views

Fail to receive vlan traffic with X540 and SR-IOV

On a server with an Intel X540 NIC adn SR-IOV enabled, I have a network configuration with

eth0 pf ---\

\

--- bond0 --- bond0.3001

/

eth1 pf ---/

I can ping from bond0 to a connected host, but not from bond0.3001.

Packets are leaving the server and according to monitoring on the switch, reply packets are sent back, but the packet are not seen on the server (tcpdumped on eth0, bond0 and bond0.3001).

The strange thing is that the ping from bond0.3001 starts working when I add a vlan interface on one of the virtual functions

ip link set dev eth64 up

vconfig add eth64 3001

ip link set dev eth64.3001 up

This vlan interface can be created on the host system or on a virtual machine that uses one of the virtual functions of eth0. In both cases the ping starts to work and stops when the vlan interface is removed.

Could this be a bug in the driver?

The issue does not appear when SR-IOV is disabled. Then bonding and vlan interfaces on the bond work as expected.

Some additional information:

Server:

HP ProLiant DL360p Gen8

NIC:

lspci

...

03:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)

03:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)

03:10.0 Ethernet controller: Intel Corporation X540 Ethernet Controller Virtual Function (rev 01)

03:10.1 Ethernet controller: Intel Corporation X540 Ethernet Controller Virtual Function (rev 01)

...

ethtool -i eth0

driver: ixgbe

version: 3.19.0.46

firmware-version: 0x80000435, 1.285.0

bus-info: 0000:03:00.0

supports-statistics: yes

supports-test: yes

supports-eeprom-access: yes

supports-register-dump: yes

supports-priv-flags: no

OS:

CentOS release 6.5 (Final)

0 Kudos
6 Replies
Highlighted
6 Views

Thanx for posting your issue to our blog. I've passed this onto the lab folks and they are working on reproducing the issue now. Will provide updates as I get them.

thanx again,

Patrick

0 Kudos
Highlighted
6 Views

Can you please provide the kernel log from the failing list? Also what kernel version are you using?

thanx,

Patrick

0 Kudos
Highlighted
Beginner
6 Views

Boot server SRV-0-2.

A ping from SRV-0-1 to SRV-0-2 on interface br0.3001 is running. (br0 is a open vSwitch bridge)

[dia6-hm_7-root@SRV-0-2 ~]# dmesg -c > dmesg-boot.log

[dia6-hm_7-root@SRV-0-2 ~]# ifdown eth64.3001

#

# Ping stopped working

#

[dia6-hm_7-root@SRV-0-2 ~]# dmesg -c > dmesg-ethdown.log

[dia6-hm_7-root@SRV-0-2 ~]# ifup eth64.3001

#

# Ping resumes

#

[dia6-hm_7-root@SRV-0-2 ~]# dmesg -c > dmesg-ethup.log

[dia6-hm_7-root@SRV-0-2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth64

# Generated by puppet-network on 2014-06-03 13:48:16

DEVICE=eth64

VLAN=no

BOOTPROTO=none

NOZEROCONF=yes

USERCTL=no

ONBOOT=yes

[dia6-hm_7-root@SRV-0-2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth64.3001

# Generated by puppet-network on 2014-06-03 13:48:16

DEVICE=eth64.3001

VLAN=yes

BOOTPROTO=none

NOZEROCONF=yes

USERCTL=no

ONBOOT=yes

[dia6-hm_7-root@SRV-0-2 ~]# uname -a

Linux SRV-0-2 2.6.32-431.17.1.el6.x86_64 # 1 SMP Wed May 7 23:32:49 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

[dia6-hm_7-root@SRV-0-2 ~]# rpm -qi kernel-2.6.32-431.17.1.el6.x86_64

Name : kernel Relocations: (not relocatable)

Version : 2.6.32 Vendor: CentOS

Release : 431.17.1.el6 Build Date: Thu 08 May 2014 12:48:49 AM UTC

Install Date: Tue 03 Jun 2014 01:22:29 PM UTC Build Host: c6b8.bsys.dev.centos.org

Group : System Environment/Kernel Source RPM: kernel-2.6.32-431.17.1.el6.src.rpm

Size : 126763163 License: GPLv2

Signature : RSA/SHA1, Thu 08 May 2014 05:52:54 PM UTC, Key ID 0946fca2c105b9de

Packager : CentOS BuildSystem <</span>http://bugs.centos.org/ http://bugs.centos.org>

URL : http://www.kernel.org/ http://www.kernel.org/

Summary : The Linux kernel

Description :

The kernel package contains the Linux kernel (vmlinuz), the core of any

Linux operating system. The kernel handles the basic functions

of the operating system: memory allocation, process allocation, device

input and output, etc.

0 Kudos
Highlighted
6 Views

That was helpful, thanx!

I tlooks like you are encountering a spoof packet situation. Please do the following:

  • Upgrade iproute2 utility to version 3.3.0 and newer
  • Upgrade Linux kernel to version 3.0.0 and newer
  • Run the command "ip link set interfacename number spoofchk off
    • Where: interfacename is

       

      the physical NIC name that has VFs. number is the VF that is participating in Linux

       

      bond.

The above will disable Spook Checking feature of the Intel NIC and traffic will flow as expected.

0 Kudos
Highlighted
Beginner
6 Views

That doesn't seem to fix our problem.

We installed kernel 3.10.42

# uname -a

Linux SRV-0-2 3.10.42-1.el6.elrepo.x86_64 # 1 SMP Sat Jun 7 20:16:58 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux

And ip version 3.10 (iproute2-ss130716)

It doesn't seem to be possible to disable the spoofchk parameter for the physical interface. The bond0 interface is a bond0 over the physical funtions of eth0 and eth1.

I attempted to disable spoofchk on all virtual functions, but that didn't have any effect.

for ((i=0;$i<64;i++)); do ./ip link set dev eth0 vf $i spoofchk off; done

[dia6-hm_7-root@SRV-0-2 ~]# ./ip link show dev eth0

2: eth0: mtu 2000 qdisc mq master bond0 state UP mode DEFAULT qlen 1000

link/ether 38:ea:a7:92:76:30 brd ff:ff:ff:ff:ff:ff

vf 0 MAC 00:00:00:00:00:00, spoof checking off

vf 1 MAC 00:00:00:00:00:00, spoof checking off

vf 2 MAC 00:00:00:00:00:00, spoof checking off

vf 3 MAC 00:00:00:00:00:00, spoof checking off.

It's still possible to make the ping work by creating a vlan interface on one of the VF of eth0. (ifup eth64.3001 makes the ping work, ifdown eth64.3001 makes the ping stop).

0 Kudos
Highlighted
6 Views

Did you take a look in dmesg for the new setup? The old one clearly was catching spoofed packets:

ixgbe 0000:03:00.0: eth0: 1 Spoofed packets detected

A couple of things - are you doing this for eth1 as well? Since you are bonding you need to do it for both devices. And I see you are calling setting this for VF0 -->VF63, however the 1G devices only have 0-->7.

thanx,

Patrick

0 Kudos