- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Greetings all,
we are using stock ubuntu natty, kernel 2.6.38-8 with KVM 0.14.0 and libvirt 0.8.8. We are attaching VFs of Intel NIC 82599 to instances created by KVM.
Usually, when KVM process starts with VFs attached, we see the following prints in kern.log on the physical machine (for each VF):
Mar 24 14:30:52 ashzadapp08p kernel: [158360.469082] pci-stub 0000:07:12.3: irq 152 for MSI/MSI-X
Mar 24 14:30:52 ashzadapp08p kernel: [158360.469093] pci-stub 0000:07:12.3: irq 153 for MSI/MSI-X
Mar 24 14:30:52 ashzadapp08p kernel: [158360.469102] pci-stub 0000:07:12.3: irq 154 for MSI/MSI-X
Mar 24 14:30:53 ashzadapp08p kernel: [158360.709013] pci-stub 0000:07:12.3: irq 152 for MSI/MSI-X
Mar 24 14:30:53 ashzadapp08p kernel: [158360.709024] pci-stub 0000:07:12.3: irq 153 for MSI/MSI-X
Mar 24 14:30:53 ashzadapp08p kernel: [158360.709036] pci-stub 0000:07:12.3: irq 154 for MSI/MSI-X
Note that there is a double print for each IRQ.
If later we peek into /proc/interrupts, we see the appropriate interrupts: "PCI-MSI-edge kvm:0000:07:12.3" with appropriate IRQs.
In some cases, however, we see only a single print for each IRQ, and later we don't find the appropriate interrupt assigned to the VF:
Mar 24 14:30:53 ashzadapp08p kernel: [158360.948116] pci-stub 0000:04:10.6: irq 262 for MSI/MSI-X
Mar 24 14:30:53 ashzadapp08p kernel: [158360.948127] pci-stub 0000:04:10.6: irq 263 for MSI/MSI-X
Mar 24 14:30:53 ashzadapp08p kernel: [158360.948136] pci-stub 0000:04:10.6: irq 264 for MSI/MSI-X
As a result, VF is not functioning at all within the VC. The only way to fix this issue is to stop the KVM process and restart it. Then VFs get re-attached properly.
Can anybody advise why could this be happening? Or how we can debug this issue further?
The ixgbe version is: 3.7.17-NAPI
The ixgbevf version is: 1.0.19-k0
Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alex.
One of the interesting challenges in doing a feature such as SR-IOV in Open Source is all the pieces must work together. The Distro's pick and choose components, and sometimes not all the correct pieces are put into a distro release.
I would suggest you try the 3.8.21 Source Forge release and see how that works for you.
Please let us know how it goes.
- Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Patrick.
I already looked at the later drivers and saw that 3.7.21 fixes an "SR-IOV critical bug". Can you pls give us more detail what bug this is?
According to the code, it looks like 3.7.21 it mostly adds some kind of "VLAN Pool Filter" functionality. While the latest 3.8.21 seems to have major code changes.
We will try & let you know.
Thanks!
Alex.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I look forward to hearing your resutls.
As to what was in that release - those details were left out on purpose. Seems it was some kind of security update, and to expose the details could lead to a security problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Patrick,
I now have a scenario that always reproduces the issue. It happens when I spawn 8 VMs, each having 4 VFs attached. At least one VF ends up not having an interrupt, and, as a result, non functional.
I tested the scenario with the following driver versions:
3.2.9-k2 (packaged together with ubuntu-natty) - issue does not happen
3.7.17 - issue happens
3.7.21 - issue happens
3.8.21 - issue happens
When compiling the driver, I used only the CFLAGS_EXTRA="-DIXGBE_NO_LRO" option (to make the driver compatible with bridging/routing as README advises). Should I use some additional flags?
Thanks,
Alex.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the detailed information. Can you provide a bit more?
Details about the server configuration, Model, memory, CPU etc.
We will try to reproduce the issue and investigate. WIll provide an update when I have more information.
- Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Patrick.
Here is some info about the system. Please let me know whether anything else you need to debug this (like enabling some debug prints etc).
Below are dmidecode details:
root@ubuntu-sata-51:/# dmidecode -t processor
# dmidecode 2.9
SMBIOS 2.6 present.
Handle 0x0400, DMI type 4, 40 bytes
Processor Information
Socket Designation: CPU1
Type: Central Processor
Family: Xeon
Manufacturer: Intel
ID: C2 06 02 00 FF FB EB BF
Signature: Type 0, Family 6, Model 44, Stepping 2
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
CLFSH (CLFLUSH instruction supported)
DS (Debug store)
ACPI (ACPI supported)
MMX (MMX technology supported)
FXSR (Fast floating-point save and restore)
SSE (Streaming SIMD extensions)
SSE2 (Streaming SIMD extensions 2)
SS (Self-snoop)
HTT (Hyper-threading technology)
TM (Thermal monitor supported)
PBE (Pending break enabled)
Version: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
Voltage: 1.2 V
External Clock: 5860 MHz
Max Speed: 3600 MHz
Current Speed: 2400 MHz
Status: Populated, Enabled
Upgrade:
L1 Cache Handle: 0x0700
L2 Cache Handle: 0x0701
L3 Cache Handle: 0x0702
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Core Count: 6
Core Enabled: 6
Thread Count: 12
Characteristics:
64-bit capable
Handle 0x0401, DMI type 4, 40 bytes
Processor Information
Socket Designation: CPU2
Type: Central Processor
Family: Xeon
Manufacturer: Intel
ID: C2 06 02 00 FF FB EB BF
Signature: Type 0, Family 6, Model 44, Stepping 2
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
CLFSH (CLFLUSH instruction supported)
DS (Debug store)
ACPI (ACPI supported)
MMX (MMX technology supported)
FXSR (Fast floating-point save and restore)
SSE (Streaming SIMD extensions)
SSE2 (Streaming SIMD extensions 2)
SS (Self-snoop)
HTT (Hyper-threading technology)
TM (Thermal monitor supported)
PBE (Pending break enabled)
Version: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
Voltage: 1.2 V
External Clock: 5860 MHz
Max Speed: 3600 MHz
Current Speed: 2400 MHz
Status: Populated, Idle
Upgrade:
L1 Cache Handle: 0x0703
L2 Cache Handle: 0x0704
L3 Cache Handle: 0x0705
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Core Count: 6
Core Enabled: 6
Thread Count: 12
Characteristics:
64-bit capable
root@ubuntu-sata-51:/# dmidecode -t system
# dmidecode 2.9
SMBIOS 2.6 p...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We tried to reproduce this on a Dell R710 and could not make it fail.
How many Intel 82599 Devices do you have in the system? If you have a few, all of them will get VF's created on them, in which case you could potentially run out of available interrupts.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have one dual-port 82599 device. So totally we have 2 PFs, each one spawning 22 VFs, so totally 44 VFs. Four of the VFs are left out for the physical machine itself. The rest are available for VMs. In my test I create 8 (sometimes 9) VMs, each one receiving 4 VFs.
Patrick, it doesn't seem that I run out of interrupts. This also happened when very few VMs were running on the node. With more VMs it is just easier to repro. Also, as I mentioned, I usually see a single print for the "bad" VF that it receives an IRQ. But later it ends up without the IRQ.
Thanks for your assistance, Patrick. If you are willing to help us debug this further, please let me know what additional info is needed.
Alex.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, it was just a shot in the dark :-)
Do you have this issue with a different OS (say Red Hat)? You might also, as a test, download the latest kernel and give it a try.
You may also try getting the latest BIOS on your system. As you are aware, SR-IOV requires many ingrediants to function properly, BIOS, Platform HW, OS and NIC.
We know that the NIC and OS work for us on a different server with the same configuration, so I'd suggest good old fashioned debugging by process of elimination.
Hopefully somebody out there has expereinced something like this also and will post here.
If you do find something, please report back. Likewise if I stumble across anything I will post it here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Patrick,
We don' use any other OS. As you probably understand, we don't have the bandwidth to test all the different combinations of kernel, bios and drivers. Currently, we downgraded to 3.2.9 version of the driver, and the issue does not reproduce.
Thanks for your efforts to repro the issue.
Alex.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Patrick,
we have moved to Ubuntu-Precise 3.2.0-29-generic # 46, and we are seeing this issue again with PF drivers: 3.7.21, 3.8.21, 3.9.15, 3.9.17, 3.11.33. (Versions 3.10.17, 3.10.16 were not tested yet). However, this time I debugged deeper, and the problem might not be related to PF drivers. The problem seems to happen in the path where KVM asks to allocate IRQs to VFs. Specifically, what I see is that pci_enable_msix() kernel function fails. (It is called from KVM's assigned_device_enable_host_msix() function). Once it failed with EINVAL, and other time with ENOMEM. Can you perhaps check with your devs what might be the cause and how to debug this further?
Thanks,
Alex.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alex,
My Guru says that there are some recent changes to the kernel and that some of the distro's may not have grabbed all the changes. Suggest you try applying this:
http://us.generation-nt.com/patch-kvm-fix-device-assignment-threaded-irq-handler-help-208140031.html http://us.generation-nt.com/patch-kvm-fix-device-assignment-threaded-irq-handler-help-208140031.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Patrick,
We have debugged the issue deeper, and it looks like different errors we are seeing all stem from the fact that the VF PCI device sometimes (not always) does not report the capability PCI_CAP_ID_MSIX.
We see that Linux kernel code calls pci_find_capability(dev, PCI_CAP_MSIX), which further calls __pci_find_next_cap_ttl(). This function reads PCI configuration space of the VF PCI device, and looks for the required id. So sometimes, it does not find the PCI_CAP_ID_MSIX id. Then this function returns 0. This is the root cause of the errors we are seeing - on different code paths they all call pci_find_capability(dev, PCI_CAP_MSIX).
We tried to sleep for 100ms after such failure, and then check for PCI_CAP_MSIX capability again, and it was reported properly. So it looks like a transient HW issue of not reporting the capability.
Can you please check with your hw/sw engineers what could be the cause of HW not reporting this capability?
FYI, the detailed analysis is here: http://www.spinics.net/lists/linux-pci/msg19014.html http://www.spinics.net/lists/linux-pci/msg19014.html
Thanks!
The code of __pci_find_next_cap_ttl():
static int __pci_find_next_cap_ttl(struct pci_bus *bus, unsigned int devfn,
u8 pos, int cap, int *ttl)
{
u8 id;
while ((*ttl)--) {
pci_bus_read_config_byte(bus, devfn, pos, &pos);
if (pos < 0x40)
break;
pos &= ~3;
pci_bus_read_config_byte(bus, devfn, pos + PCI_CAP_LIST_ID,
&id);
if (id == 0xff)
break;
if (id == cap)
return pos;
pos += PCI_CAP_LIST_NEXT;
}
return 0;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Patrick,
some additional reproductions showed two cases of failure to find the MSIX capability:
Case1: HW reports PCI_CAP_ID_SLOTID (0x04) and PCI_CAP_ID_EXP (0x10), but not PCI_CAP_ID_MSIX (0x11).
Case2: HW reports no capabilities at all
A retry (after a short delay) in both cases reports the MSIX capability alright.
Thanks,
Alex.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've sent your info to our experts and they are investigating. Will let you know what we find.
- Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alex,
Have you apllied the patch: http://us.generation-nt.com/patch-kvm-fix-device-assignment-threaded-irq-handler-help-208140031.html http://us.generation-nt.com/patch-kvm-fix-device-assignment-threaded-irq-handler-help-208140031.html yet?
We believe this is not something in the driver, but at a lower (kernel) level issue.
- Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Patrick,
No I haven't tried to repro with this patch yet. This patch changes passing a callback function instead of NULL to request_threaded_irq() call. However, our problem happens even before we call request_threaded_irq(). We hit an issue when calling pci_enable_msix(), which does not find the required MSIX capability on the VF PCI device. In addition, this patch is against kernel v3.5, while we are running kernel 3.2.
I will try to retest again with this patch applied, though.
Thanks,
Alex.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page