Cannot pass ipv6 packets into a VM using VFs (82599)

SKenn2 · ‎06-05-2013

Hello,

Short story - cannot receive ipv6 packets in a KVM guest using SR-IOV interfaces (VF), but, ipv4 works fine.

Long Story:

Have 2 KVM VMs set up on 2 separate hosts. OS in the host and the guest is RHEL6.4. The hosts use the 82599 NICs. Have assigned a VFs to each guest (eh4/eth5) using unique mac addresses in the XML guest definition (the other interfaces in the VM are bridged interfaces). Assigned a unique VLAN to each VF per VM (eth4 is 610, eth5 is 611) using 'ip link set' commands. Using Intel provided driver ixgbe (3.15.1) on the host and ixgbevf (2.8.7) in the guest. Plumbed ipv4 and ipv6 IP in the guests for eth4 on each VM. Ping (ipv4) works fine. We can see the packet enter the NIC on the receive side host (tcpdump) and can see it in the guest too. When we repeat the process for ipv6, it fails. It leaves the guest fine, we see it hit the NIC on the received host), but it is NOT received by the guest across the VF. If we pull out the VFs and replace with bridges, ipvv6 works fine so it's not a routing/network issue.

Is there something we need to set up/configure to get ipv6 to show up in the guest?? Note we can also ping out successfully to the upstream L3 switch (subnet gateway), but not to the other VM.

Thanks,

Shawn

Patrick_K_Intel1 · ‎06-05-2013

Hi Shawn,

We will do a quick test to see if we can reproduce your issue and get back to you.

thanx,

Patrick

Patrick_K_Intel1 · ‎06-05-2013

Shawn,

Can you provide the following information to help us in our investigation:

"ip link show" output from both RHEL6.4 KVM hosts.
"ip link show" output from both RHEL6.4 VMs.

Thanx,

Patrick

SKenn2 · ‎06-05-2013

Do you want to see *everything* (lots of data per interface with lots of interfaces) or only the interfaces in question??

Patrick_K_Intel1 · ‎06-05-2013

Just the interfaces in question. The PFs and VFs.

Thanx!

Patrick

SKenn2 · ‎06-05-2013

Here you go. I hope it helps!!

Shawn

Host04:

[root@hplcp051-host04 ~]# ip link show eth0 | head

2: eth0: mtu 1508 qdisc mq state UP qlen 10000

link/ether e4:11:5b:95:03:4c brd ff:ff:ff:ff:ff:ff

vf 0 MAC 42:4c:43:50:01:64, vlan 610

vf 1 MAC 42:4c:43:50:01:6e, vlan 660

vf 2 MAC 72:c2:a2:12:00:64, vlan 610

vf 3 MAC 72:c2:a2:12:00:6e, vlan 660

vf 4 MAC 72:c2:a2:12:00:84, vlan 610

vf 5 MAC 72:c2:a2:12:00:8e, vlan 660

[root@hplcp051-host04 ~]# ip link show eth1 | head

3: eth1: mtu 1508 qdisc mq state UP qlen 10000

link/ether e4:11:5b:95:03:4d brd ff:ff:ff:ff:ff:ff

vf 0 MAC 42:4c:43:50:01:65, vlan 611

vf 1 MAC 42:4c:43:50:01:6f, vlan 661

vf 2 MAC 72:c2:a2:12:00:65, vlan 611

vf 3 MAC 72:c2:a2:12:00:6f, vlan 661

vf 4 MAC 72:c2:a2:12:00:85, vlan 611

vf 5 MAC 72:c2:a2:12:00:8f, vlan 661

host12:

[root@hplcp051-host12 ~]# ip link show eth0 | head

2: eth0: mtu 1508 qdisc mq state UP qlen 10000

link/ether e4:11:5b:95:05:00 brd ff:ff:ff:ff:ff:ff

vf 0 MAC 42:4c:43:50:11:64, vlan 610

vf 1 MAC 42:4c:43:50:11:6e, vlan 660

vf 2 MAC 72:c2:a2:12:00:74, vlan 610

vf 3 MAC 72:c2:a2:12:00:7e, vlan 660

vf 4 MAC 72:c2:a2:12:00:94, vlan 610

vf 5 MAC 72:c2:a2:12:00:9e, vlan 660

[root@hplcp051-host12 ~]# ip link show eth1 | head

3: eth1: mtu 1508 qdisc mq state UP qlen 10000

link/ether e4:11:5b:95:05:01 brd ff:ff:ff:ff:ff:ff

vf 0 MAC 42:4c:43:50:11:65, vlan 611

vf 1 MAC 42:4c:43:50:11:6f, vlan 661

vf 2 MAC 72:c2:a2:12:00:75, vlan 611

vf 3 MAC 72:c2:a2:12:00:7f, vlan 661

vf 4 MAC 72:c2:a2:12:00:95, vlan 611

vf 5 MAC 72:c2:a2:12:00:9f, vlan 661

Guest VM s00c10:

# ip link show eth4

2: eth4: mtu 1500 qdisc pfifo_fast state UP qlen 10000

link/ether 42:4c:43:50:01:64 brd ff:ff:ff:ff:ff:ff

# ip link show eth5

3: eth5: mtu 1500 qdisc pfifo_fast state UP qlen 10000

link/ether 42:4c:43:50:01:65 brd ff:ff:ff:ff:ff:ff

/root:

Guest VM s01c10:

# ip link show eth4

2: eth4: mtu 1500 qdisc pfifo_fast state UP qlen 10000

link/ether 42:4c:43:50:11:64 brd ff:ff:ff:ff:ff:ff

# ip link show eth5

3: eth5: mtu 1500 qdisc pfifo_fast state UP qlen 10000

link/ether 42:4c:43:50:11:65 brd ff:ff:ff:ff:ff:ff

SKenn2 · ‎06-05-2013

Moe information to help align the data nicely ....

First Host interfaces

[host04 ~]# ip link show eth0 | head

2: eth0: mtu 1508 qdisc mq state UP

qlen 10000

link/ether e4:11:5b:95:03:4c brd ff:ff:ff:ff:ff:ff

vf 0 MAC 42:4c:43:50:01:64, vlan 610

vf 1 MAC 42:4c:43:50:01:6e, vlan 660

vf 2 MAC 72:c2:a2:12:00:64, vlan 610

vf 3 MAC 72:c2:a2:12:00:6e, vlan 660

vf 4 MAC 72:c2:a2:12:00:84, vlan 610

vf 5 MAC 72:c2:a2:12:00:8e, vlan 660

[host04 ~]# ethtool -k eth0

Features for eth0:

rx-checksumming: on

tx-checksumming: on

scatter-gather: on

tcp-segmentation-offload: on

udp-fragmentation-offload: off

generic-segmentation-offload: on

generic-receive-offload: off

large-receive-offload: off

rx-vlan-offload: on

tx-vlan-offload: on

ntuple-filters: off

receive-hashing: on

Guest on Host04

/root:

ip -6 addr add 2500:0:0:339::43/64 dev eth4

/root:

ip addr add 1.1.1.1/24 dev eth4 broadcast 1.1.1.255 label eth4:1

<00c10h0:root>/root:

# ip addr ls eth4

2: eth4: mtu 1500 qdisc pfifo_fast

state UP qlen 10000

link/ether 42:4c:43:50:01:64 brd ff:ff:ff:ff:ff:ff

inet 1.1.1.1/24 brd 1.1.1.255 scope global eth4:1

inet6 2500:0:0:339::43/64 scope global

valid_lft forever preferred_lft forever

inet6 fe80::404c:43ff:fe50:164/64 scope link

valid_lft forever preferred_lft forever

<00c10h0:root>/root:

# ethtool -k eth4

Features for eth4:

rx-checksumming: on

tx-checksumming: on

scatter-gather: on

tcp-segmentation-offload: on

udp-fragmentation-offload: off

generic-segmentation-offload: on

generic-receive-offload: off

large-receive-offload: off

rx-vlan-offload: on

tx-vlan-offload: on

ntuple-filters: off

receive-hashing: off

2nd Host: Host12

[host12 ~]# ip link show eth0 | head

2: eth0: mtu 1508 qdisc mq state UP

qlen 10000

link/ether e4:11:5b:95:05:00 brd ff:ff:ff:ff:ff:ff

vf 0 MAC 42:4c:43:50:11:64, vlan 610

vf 1 MAC 42:4c:43:50:11:6e, vlan 660

vf 2 MAC 72:c2:a2:12:00:74, vlan 610

vf 3 MAC 72:c2:a2:12:00:7e, vlan 660

vf 4 MAC 72:c2:a2:12:00:94, vlan 610

vf 5 MAC 72:c2:a2:12:00:9e, vlan 660

vf 6 MAC 00:00:00:00:00:00

vf 7 MAC 00:00:00:00:00:00

[host12 ~]# ethtool -k eth0

Features for eth0:

rx-checksumming: on

tx-checksumming: on

scatter-gather: on

tcp-segmentation-offload: on

udp-fragmentation-offload: off

generic-segmentation-offload: on

generic-receive-offload: off

large-receive-offload: off

rx-vlan-offload: on

tx-vlan-offload: on

ntuple-filters: off

receive-hashing: on

Guest on Host12

/root:

ip -6 addr add 2500:0:0:339::42/64 dev eth4

/root:

ip addr add 1.1.1.2/24 dev eth4 broadcast 1.1.1.255 label eth4:2

/root:

# ip addr ls eth4

2: eth4: mtu 1500 qdisc pfifo_fast

state UP qlen 10000

link/ether 42:4c:43:50:11:64 brd ff:ff:ff:ff:ff:ff

inet 1.1.1.2/24 brd 1.1.1.255 scope global eth4:2

inet6 2500:0:0:339::42/64 scope global

valid_lft forever preferred_lft forever

inet6 fe80::404c:43ff:fe50:1164/64 scope link

valid_lft forever preferred_lft forever

# ethtool -k eth4

Features for eth4:

rx-checksumming: on

tx-checksumming: on

scatter-gather: on

tcp-segmentation-offload: on

udp-fragmentation-offload: off

generic-segmentation-offload: on

generic-receive-offload: off

large-receive-offload: off

rx-vlan-offload: on

tx-vlan-offload: on

ntuple-filters: off

receive-hashing: off

ping between 1.1.1.1 and 1.1.1.2 worked without issue.

ping6 between 2500:0:0:339::42 and 2500:0:0:339:ffff:ffff:ffff:fffe

(subnet gateway on external Omniswitch) worked without issue.

ping6 between 2500:0:0:339::43 and 2500:0:0:339:ffff:ffff:ffff:fffe

(subnet gateway on external Omniswitch) worked without issue.

ping6 between 2500:0:0:339::42 and 2500:0:0:339::43 did not work...

tcpdump on the receiving eth4 interface did not show ANY traffic at the

guest level. A tcpdump at the host level on eth0 showed the following

packets...

[host04 ~]# tcpdump -i eth0 -SNevvl ether host

42:4c:43:50:01:64

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size

65535 bytes

19:11:39.459090 42:4c:43:50:01:64 (oui Unknown) > 33:33:ff:00:00:42 (oui

Unknown), et2

source link-address option (1), length 8 (1): 42:4c:43:50:01:64

0x0000: 424c 4350 0164

19:11:40.459104 42:4c:43:50:01:64 (oui Unknown) > 33:33:ff:00:00:42 (oui

Unknown), et2

source link-address option (1), length 8 (1): 42:4c:43:50:01:64

0x0000: 424c 4350 0164

19:11:42.460093 42:4c:43:50:01:64 (oui Unknown) > 33:33:ff:00:00:42 (oui

Unknown), et2

source link-address option (1), length 8 (1): 42:4c:43:50:01:64

0x0000: 424c 4350 0164

Patrick_K_Intel1 · ‎06-05-2013

Thanx! That is very helpful.

We are setting up a test envionment now and hope to have a response by the end of the week.

Thanks for posting to our blog.

- Patrick

Patrick_K_Intel1 · ‎06-10-2013

We configured two hosts with RHEL 6.4 KVM with ixgbe driver version 3.15.1, created two VFs each.

We also created RHEL 6.4 VMs and a VF assigned to them. VM has ixgbvf river version 2.8.7.

We can successfully perform ipv6 ping to and from each VF on both hosts. Note: At this point, no VLANs were setup.

Next, we set port VLAN 610 for VF1 on each host and port VLAN 611 for VF2 on each host. At this point ipv6 ping started to fail. On further investigation, we found out that the switch port that our host servers were onnected to needed to be in trunk mode and both VLAN 610 and 611 needed to be allowed explicitly on the

switch ports. Once the changes to the switch were made, ipv6 ping traffic start to flow normally.

Please verify your switch configuration and make sure switch ports are in trunk mode and if necessary allow all the VLAN IDs on the switch port.

thanx,

Patrick

SKenn2 · ‎06-13-2013

Hi Patrick,

Thank you for taking the time to set this up and test it. I am sorry I did not get back to you sooner, it took awhile for the lab to revert back to this config to continue to test.

In general, we see the following (so we know that the VLANs and our switch is set up correctly.

1) ipv4 from a VM across a SR-IOV interface to a different VM on a different host

using SR-IOV interfaces works.

2) ipv4 from a VM across a SR-IOV interface to a different VM on a same host using

SR-IOV interfaces works.

3) ipv6 from a VM across a SR-IOV interface to a different VM on a different host

using SR-IOV interfaces fails.

4) ipv6 from a VM across a SR-IOV interface to a different VM on a same host using

SR-IOV interfaces fails.

5) ipv6 from a VM across a SR-IOV interface to a different VM on a different

host using bridged interfaces works.

6) ipv6 from a VM across a bridged interface to a different VM on a different

using SR-IOV interfaces works (this is the opposite direction of test 5 and only

works because neighbor discovery was successfully completed by # 5 - otherwise

this fails).

We mirroed the port on the switch and can see the neighbor discovery message enter from the sending side desitined for the recieve side. tcpdump on the receive side host level clearly shows the neighbor discovery hit the nic (likely because it's a multicast packet), but those packets never made it to the VF and was never seen by the interface inside the VM. I can get you a pcap file to look at seperately if you want.

Since you did get it to work, I want to make sure our environemnts are comparable.

1) I am using RHEL6.4 ( 2.6.32-358.6.1.el6.x86_64).

2) I am using a 82559-based NIC.

3) This is my version of iproute on the host (iproute-2.6.32-23.el6.x86_64).

If you have newer or different equipment, it might explain our different results.

Thank you!!

Shawn

Patrick_K_Intel1 · ‎06-14-2013

We are using the same exact RHEL as you - with no updates applied - have you done any updates via RHN maybe?

We are also using same iproute-2.

Are you compiling a custom Linux Kernel by chance?

Who manufactured this NIC? You said it was an 82559 - I assume you meant 82599. Did the NIC come bundled with the server or a separate purchase? Can you provide the MM and PBA numbers from the NIC board?

SKenn2 · ‎06-14-2013

Sorry for the typo. Yep, it's a 82599 NIC. Specifically, we are using HP BL460C Gen8 blades and HP 560FLB NICs. We have the latest HP FW (1.3.6) for the NIC applied. Our switch is a 6102XG Procurves and the whole thing is in a C7000 chassis. I am away from teh HW so won't be able to pull the nic and get any details. Any info that could be gathered via SW tools??

No, we have not updated RHEL6.4 (although we tried an update to the Jun 1st snap of RH from our satellite server and that made no difference).

Here is a bit more info on tcpdumps if that helps.

v6 tcpdump on guest source -i eth4:

11:54:16.313338 IP6 2500::339:0:0:42:2 > ff02::1:ff42:1: ICMP6, neighbor solicitation, who has 2500::339:0:0:42:1, length 32

11:54:17.313275 IP6 2500::339:0:0:42:2 > ff02::1:ff42:1: ICMP6, neighbor solicitation, who has 2500::339:0:0:42:1, length 32

11:54:18.313353 IP6 2500::339:0:0:42:2 > ff02::1:ff42:1: ICMP6, neighbor solicitation, who has 2500::339:0:0:42:1, length 32

v6 tcpdump on host dest -i eth0:

16:55:45.295626 IP6 2500::339:0:0:42:2 > ff02::1:ff42:1: ICMP6, neighbor solicitation, who has 2500::339:0:0:42:1, length 32

16:55:47.296691 IP6 2500::339:0:0:42:2 > ff02::1:ff42:1: ICMP6, neighbor solicitation, who has 2500::339:0:0:42:1, length 32

v6 tcpdump on guest dest -i eth4:

... (nothing

It's like the multicast packet (neighbor solicitation) isn't getting passed onto the VF on the receive side.

Even more data!

If we ping from this VM to a different VM on a different host that is using bridged interfaces, that works and if we immediately ping back the other way (bridged to SR-IOV), that works too because the neighbor solicitation is not required (it was just set up by the previous ping) and the V6 ping works.

It sure smells like a FW or driver issue, but we have the latest of both.

I hope this helps!!

Shawn

SKenn2 · ‎06-14-2013

Several things also to note.

1) According to HP documentation on that version of the FW, it states "This package contains firmware version 1.289.0."

2) The latest HP provided Gen8 BIOS is also loaded (2013.03.01)

3) In the RBSU (HP's BIOS), SR-IOV is enabled.

Thanks!

Shawn

Patrick_K_Intel1 · ‎06-14-2013

Afraid that firmware version doesnt' mean much to us, looks like an HP numbering scheme.

All I can suggest is that you try this without the switch, back to back. We are using the same OS,driver and tools that you are and can't reproduce your issue.

We are using an Intel Generic rather than an HP specific device. If removing the switch from the equation doesn't help, I suggest you work with HP. Please let us know what you find.

thanx,

Patrick

SKenn2 · ‎06-14-2013

Unfortunately, with a C7000 enclosure, it is impossible to remove the switch and replace it with something else. All ethernet is through a backplane into the procurves.

It's not the procurves/switches since a) V4 works and b) the multi-cast neighbor discovery packet makes it to the terminating host-side NIC. It's something specific to multicast V6 and that NIC..

Thank you for your (and Intel's) time!!

Shawn

Patrick_K_Intel1 · ‎06-14-2013

Well that is unfortunate. I can tell you that there is nothing in the Hardware of the NIC that cares at all about IPv6 in any way. VF's filter on MAC and or VLAN. If it passes those requirements or is a broadcast packet (such as Neightborhood Discovery) it gets sent to the VF driver.

If you find anything else out, please let us know.

thanx,

Patrick

SKenn2 · ‎06-17-2013

Hi Patrick,

I have gone back to basics and have removed the VLANs from the equation. I have also gone down to 1 VF per interfaces (max_vfs=1,1 in /etc/modprobe.d/ixgbevf.conf).. Still, the ipv6 multicast does not make it in to

the guest while ipv4 works fine. As such, I am wondering if you could share with me some of your test configurations.

1) How many VFs did you configure for your host?

2) What was your MAC addresses for the SR-IOV interfaces?

3) Did you use a MAC based upon the link local MAC? BTW, we cannot ping6 those either.

4) What was your ipv6 addresses you used?

5) Can you share the 'ethtool -k' on the host and guest interfaces??

6) How did you start up your VM (by command line or by XML file)?

7) Can you share your VM config XML (either the file itself or by issuing a 'virsh dumpxml' to a file)?

We are going to contact HP to secure a passthru connections in place of the procurves even though the mirror port shows everything looking fine, we want to help rule that out.

Would dumping out the registers help (ethregs)?? What registers should we pay attention to and how should be

interpret it??

BTW, here is some more information from our systems that may or maynot be helpful:

Host:

[root@hplcp051-host04 qemu]# ethtool -i eth0

driver: ixgbe

version: 3.13.10

firmware-version: 0x8000038b, 1.289.0

bus-info: 0000:04:00.0

supports-statistics: yes

supports-test: yes

supports-eeprom-access: yes

supports-register-dump: yes

supports-priv-flags: no

Guest:

# ethtool -i eth4

driver: ixgbevf

version: 2.8.7

firmware-version: N/A

bus-info: 0000:00:0b.0

supports-statistics: yes

supports-test: yes

supports-eeprom-access: no

supports-register-dump: yes

supports-priv-flags: no

Thanks!

Shawn

SKenn2 · ‎06-20-2013

HI Patrick,

We went ahead and used 'ethregs' to pull the NIC registers in hopes of finding an issue. One of our enginners then looked at the code and the registers and came up with a write-up. In his mind, something is amiss and he sees the multicast registers amiss. See below for his write up. I dont see a way to upload the dumps to this forum but if you need them, we have them ready.

Many thanks!

Shawn

====================================================================

We dumped the registers for the ixgbe and ixgbevf

When an IPv6 address is configured in the VM, 2500::339:0:0:42:110 in our case, that triggers assigning an IPv6 multicast interface address, ff02::1:ff42:110 to the interface, to react to Neighbor Discovery requests. That IP multicast address is also translated to a multicast MAC address, as are all multicasts, 33:33:ff:42:01:10 in this case. When we try to pingv6 that address, the trace looks like this:

42:4c:43:50:01:64 (oui Unknown) > 33:33:ff:42:01:10 (oui Unknown), ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32)

2500::339:0:0:42:10 > ff02::1:ff42:110: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has 2500::339:0:0:42:110

source link-address option (1), length 8 (1): 42:4c:43:50:01:64

0x0000: 424c 4350 0164

When the IPv6 address is assigned on the VM, I expect that the VM will send a request to the host, asking for the multicast address to be configured. Since only the multicast overflow is used, that should cause the relevant PFVML2FLT to have ROPE set, and that the MTA will have data for the multicast MAC, and that MCSTCTRL will have bit 2 set to enable the multicast table array filter.

What we see, is that the MCSTCTRL has bit 2 set to 0, which should only happen when the MTA array is all zero. The only thing which indirectly causes the MCSTCTRL bit 2 to be reset after the MTA array is populated is

Additionally, PFVML2FLT[00] has receive multicast overflow set, but [01] does not:

PFVML2FLT[00]=0x0b000000: No Multicast Promiscuous, Broadcast Accept, not receive MAC overflow, receive overflow multicast, accept untagged packets without VLAN tag.

PFVML2FLT[01]=0x19000000: Multicast Promiscuous, Broadcast Accept, not receive MAC overflow, not receive overflow multicast, accept untagged packets without VLAN tag

The only way MCSTCTRL bit 2 (IXGBE_MCSTCTRL_MFE) should be reset according to the software is if there are no multicast MACs in the MTA table, but there are such MACS.

MCSTCTRL: 0: 1:0 selects multicast MAC offset -> 47:36. BUT!!! bit 2=0 -> multicast table array filter is disabled!!!!!

MTA: Multicast Table Array, selects 4096 multicast addresses which can be forwarded, bits selected based on MCSTCTRL register:

MTA[000]=0x00010000

MTA[046]=0x10000000

MTA[050]=0x00010000

MTA[113]=0x20000000

For second port:

MTA[000]: 0x00010000

MTA[008]: 0x00000003

MTA[050]: 0x00000001

MTA[070]: 0x00002000

JHall6 · ‎07-01-2013

We found a driver bug, although not sure about the exact fix. The bug was reported to the e1000 mailing list:

http://permalink.gmane.org/gmane.linux.drivers.e1000.devel/12231 ixgbe: BUG changing MTU or LRO setting disables VF multicast reception

A workaround for our specific case is in the link, but it's not a general fix. The underlying problem is that both the main part of the driver and the SRIOV part of the driver both share the MCAST table and enable flag, but they do not work cooperatively.

John Haller

DMari8 · ‎11-11-2013

We are hitting the same issue. Is anybody working on the fix? This issue is preventing the use of IPv6 on the VF.

Thanks,

Damjan

KEgef · ‎12-18-2013

Hi,

Ipv4 multicast reception on VFs does not work either, same cause it seems, making it impossible to run OSPF on VFs.

Any news on this issue ?

Thanks

Kristoffer