Ethernet Products
Determine ramifications of Intel® Ethernet products and technologies
5336 Discussions

Interrupt not assigned to a VF attached to a KVM instance

ALyak
Beginner
4,984 Views

Greetings all,

we are using stock ubuntu natty, kernel 2.6.38-8 with KVM 0.14.0 and libvirt 0.8.8. We are attaching VFs of Intel NIC 82599 to instances created by KVM.

Usually, when KVM process starts with VFs attached, we see the following prints in kern.log on the physical machine (for each VF):

Mar 24 14:30:52 ashzadapp08p kernel: [158360.469082] pci-stub 0000:07:12.3: irq 152 for MSI/MSI-X

 

Mar 24 14:30:52 ashzadapp08p kernel: [158360.469093] pci-stub 0000:07:12.3: irq 153 for MSI/MSI-X

 

Mar 24 14:30:52 ashzadapp08p kernel: [158360.469102] pci-stub 0000:07:12.3: irq 154 for MSI/MSI-X

 

Mar 24 14:30:53 ashzadapp08p kernel: [158360.709013] pci-stub 0000:07:12.3: irq 152 for MSI/MSI-X

 

Mar 24 14:30:53 ashzadapp08p kernel: [158360.709024] pci-stub 0000:07:12.3: irq 153 for MSI/MSI-X

 

Mar 24 14:30:53 ashzadapp08p kernel: [158360.709036] pci-stub 0000:07:12.3: irq 154 for MSI/MSI-X

Note that there is a double print for each IRQ.

If later we peek into /proc/interrupts, we see the appropriate interrupts: "PCI-MSI-edge kvm:0000:07:12.3" with appropriate IRQs.

In some cases, however, we see only a single print for each IRQ, and later we don't find the appropriate interrupt assigned to the VF:

Mar 24 14:30:53 ashzadapp08p kernel: [158360.948116] pci-stub 0000:04:10.6: irq 262 for MSI/MSI-X

 

Mar 24 14:30:53 ashzadapp08p kernel: [158360.948127] pci-stub 0000:04:10.6: irq 263 for MSI/MSI-X

 

Mar 24 14:30:53 ashzadapp08p kernel: [158360.948136] pci-stub 0000:04:10.6: irq 264 for MSI/MSI-X

As a result, VF is not functioning at all within the VC. The only way to fix this issue is to stop the KVM process and restart it. Then VFs get re-attached properly.

Can anybody advise why could this be happening? Or how we can debug this issue further?

The ixgbe version is: 3.7.17-NAPI

The ixgbevf version is: 1.0.19-k0

Thanks!

0 Kudos
18 Replies
Patrick_K_Intel1
Employee
2,435 Views

Hi Alex.

One of the interesting challenges in doing a feature such as SR-IOV in Open Source is all the pieces must work together. The Distro's pick and choose components, and sometimes not all the correct pieces are put into a distro release.

I would suggest you try the 3.8.21 Source Forge release and see how that works for you.

Please let us know how it goes.

- Patrick

0 Kudos
ALyak
Beginner
2,435 Views

Thanks, Patrick.

I already looked at the later drivers and saw that 3.7.21 fixes an "SR-IOV critical bug". Can you pls give us more detail what bug this is?

According to the code, it looks like 3.7.21 it mostly adds some kind of "VLAN Pool Filter" functionality. While the latest 3.8.21 seems to have major code changes.

We will try & let you know.

Thanks!

Alex.

0 Kudos
Patrick_K_Intel1
Employee
2,435 Views

I look forward to hearing your resutls.

As to what was in that release - those details were left out on purpose. Seems it was some kind of security update, and to expose the details could lead to a security problem.

0 Kudos
ALyak
Beginner
2,435 Views

Hello Patrick,

I now have a scenario that always reproduces the issue. It happens when I spawn 8 VMs, each having 4 VFs attached. At least one VF ends up not having an interrupt, and, as a result, non functional.

I tested the scenario with the following driver versions:

3.2.9-k2 (packaged together with ubuntu-natty) - issue does not happen

3.7.17 - issue happens

3.7.21 - issue happens

3.8.21 - issue happens

When compiling the driver, I used only the CFLAGS_EXTRA="-DIXGBE_NO_LRO" option (to make the driver compatible with bridging/routing as README advises). Should I use some additional flags?

Thanks,

Alex.

0 Kudos
Patrick_K_Intel1
Employee
2,435 Views

Thanks for the detailed information. Can you provide a bit more?

Details about the server configuration, Model, memory, CPU etc.

We will try to reproduce the issue and investigate. WIll provide an update when I have more information.

- Patrick

0 Kudos
ALyak
Beginner
2,435 Views

Thanks, Patrick.

Here is some info about the system. Please let me know whether anything else you need to debug this (like enabling some debug prints etc).

Below are dmidecode details:

root@ubuntu-sata-51:/# dmidecode -t processor

 

# dmidecode 2.9

 

SMBIOS 2.6 present.

Handle 0x0400, DMI type 4, 40 bytes

 

Processor Information

 

Socket Designation: CPU1

 

Type: Central Processor

 

Family: Xeon

 

Manufacturer: Intel

 

ID: C2 06 02 00 FF FB EB BF

 

Signature: Type 0, Family 6, Model 44, Stepping 2

 

Flags:

 

FPU (Floating-point unit on-chip)

 

VME (Virtual mode extension)

 

DE (Debugging extension)

 

PSE (Page size extension)

 

TSC (Time stamp counter)

 

MSR (Model specific registers)

 

PAE (Physical address extension)

 

MCE (Machine check exception)

 

CX8 (CMPXCHG8 instruction supported)

 

APIC (On-chip APIC hardware supported)

 

SEP (Fast system call)

 

MTRR (Memory type range registers)

 

PGE (Page global enable)

 

MCA (Machine check architecture)

 

CMOV (Conditional move instruction supported)

 

PAT (Page attribute table)

 

PSE-36 (36-bit page size extension)

 

CLFSH (CLFLUSH instruction supported)

 

DS (Debug store)

 

ACPI (ACPI supported)

 

MMX (MMX technology supported)

 

FXSR (Fast floating-point save and restore)

 

SSE (Streaming SIMD extensions)

 

SSE2 (Streaming SIMD extensions 2)

 

SS (Self-snoop)

 

HTT (Hyper-threading technology)

 

TM (Thermal monitor supported)

 

PBE (Pending break enabled)

 

Version: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz

 

Voltage: 1.2 V

 

External Clock: 5860 MHz

 

Max Speed: 3600 MHz

 

Current Speed: 2400 MHz

 

Status: Populated, Enabled

 

Upgrade:

 

L1 Cache Handle: 0x0700

 

L2 Cache Handle: 0x0701

 

L3 Cache Handle: 0x0702

 

Serial Number: Not Specified

 

Asset Tag: Not Specified

 

Part Number: Not Specified

 

Core Count: 6

 

Core Enabled: 6

 

Thread Count: 12

 

Characteristics:

 

64-bit capable

Handle 0x0401, DMI type 4, 40 bytes

 

Processor Information

 

Socket Designation: CPU2

 

Type: Central Processor

 

Family: Xeon

 

Manufacturer: Intel

 

ID: C2 06 02 00 FF FB EB BF

 

Signature: Type 0, Family 6, Model 44, Stepping 2

 

Flags:

 

FPU (Floating-point unit on-chip)

 

VME (Virtual mode extension)

 

DE (Debugging extension)

 

PSE (Page size extension)

 

TSC (Time stamp counter)

 

MSR (Model specific registers)

 

PAE (Physical address extension)

 

MCE (Machine check exception)

 

CX8 (CMPXCHG8 instruction supported)

 

APIC (On-chip APIC hardware supported)

 

SEP (Fast system call)

 

MTRR (Memory type range registers)

 

PGE (Page global enable)

 

MCA (Machine check architecture)

 

CMOV (Conditional move instruction supported)

 

PAT (Page attribute table)

 

PSE-36 (36-bit page size extension)

 

CLFSH (CLFLUSH instruction supported)

 

DS (Debug store)

 

ACPI (ACPI supported)

 

MMX (MMX technology supported)

 

FXSR (Fast floating-point save and restore)

 

SSE (Streaming SIMD extensions)

 

SSE2 (Streaming SIMD extensions 2)

 

SS (Self-snoop)

 

HTT (Hyper-threading technology)

 

TM (Thermal monitor supported)

 

PBE (Pending break enabled)

 

Version: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz

 

Voltage: 1.2 V

 

External Clock: 5860 MHz

 

Max Speed: 3600 MHz

 

Current Speed: 2400 MHz

 

Status: Populated, Idle

 

Upgrade:

 

L1 Cache Handle: 0x0703

 

L2 Cache Handle: 0x0704

 

L3 Cache Handle: 0x0705

 

Serial Number: Not Specified

 

Asset Tag: Not Specified

 

Part Number: Not Specified

 

Core Count: 6

 

Core Enabled: 6

 

Thread Count: 12

 

Characteristics:

 

64-bit capable

 

root@ubuntu-sata-51:/# dmidecode -t system

 

# dmidecode 2.9

 

SMBIOS 2.6 p...
0 Kudos
Patrick_K_Intel1
Employee
2,435 Views

We tried to reproduce this on a Dell R710 and could not make it fail.

How many Intel 82599 Devices do you have in the system? If you have a few, all of them will get VF's created on them, in which case you could potentially run out of available interrupts.

0 Kudos
ALyak
Beginner
2,435 Views

We have one dual-port 82599 device. So totally we have 2 PFs, each one spawning 22 VFs, so totally 44 VFs. Four of the VFs are left out for the physical machine itself. The rest are available for VMs. In my test I create 8 (sometimes 9) VMs, each one receiving 4 VFs.

Patrick, it doesn't seem that I run out of interrupts. This also happened when very few VMs were running on the node. With more VMs it is just easier to repro. Also, as I mentioned, I usually see a single print for the "bad" VF that it receives an IRQ. But later it ends up without the IRQ.

Thanks for your assistance, Patrick. If you are willing to help us debug this further, please let me know what additional info is needed.

Alex.

0 Kudos
Patrick_K_Intel1
Employee
2,435 Views

Well, it was just a shot in the dark :-)

Do you have this issue with a different OS (say Red Hat)? You might also, as a test, download the latest kernel and give it a try.

You may also try getting the latest BIOS on your system. As you are aware, SR-IOV requires many ingrediants to function properly, BIOS, Platform HW, OS and NIC.

We know that the NIC and OS work for us on a different server with the same configuration, so I'd suggest good old fashioned debugging by process of elimination.

Hopefully somebody out there has expereinced something like this also and will post here.

If you do find something, please report back. Likewise if I stumble across anything I will post it here.

0 Kudos
ALyak
Beginner
2,435 Views

Hi Patrick,

We don' use any other OS. As you probably understand, we don't have the bandwidth to test all the different combinations of kernel, bios and drivers. Currently, we downgraded to 3.2.9 version of the driver, and the issue does not reproduce.

Thanks for your efforts to repro the issue.

Alex.

0 Kudos
Patrick_K_Intel1
Employee
2,435 Views

That is good data. I'll pass that along, thanx!

0 Kudos
ALyak
Beginner
2,435 Views

Hello Patrick,

we have moved to Ubuntu-Precise 3.2.0-29-generic # 46, and we are seeing this issue again with PF drivers: 3.7.21, 3.8.21, 3.9.15, 3.9.17, 3.11.33. (Versions 3.10.17, 3.10.16 were not tested yet). However, this time I debugged deeper, and the problem might not be related to PF drivers. The problem seems to happen in the path where KVM asks to allocate IRQs to VFs. Specifically, what I see is that pci_enable_msix() kernel function fails. (It is called from KVM's assigned_device_enable_host_msix() function). Once it failed with EINVAL, and other time with ENOMEM. Can you perhaps check with your devs what might be the cause and how to debug this further?

Thanks,

Alex.

0 Kudos
Patrick_K_Intel1
Employee
2,435 Views

Hi Alex,

My Guru says that there are some recent changes to the kernel and that some of the distro's may not have grabbed all the changes. Suggest you try applying this:

http://us.generation-nt.com/patch-kvm-fix-device-assignment-threaded-irq-handler-help-208140031.html http://us.generation-nt.com/patch-kvm-fix-device-assignment-threaded-irq-handler-help-208140031.html

ALyak
Beginner
2,435 Views

Hi Patrick,

We have debugged the issue deeper, and it looks like different errors we are seeing all stem from the fact that the VF PCI device sometimes (not always) does not report the capability PCI_CAP_ID_MSIX.

We see that Linux kernel code calls pci_find_capability(dev, PCI_CAP_MSIX), which further calls __pci_find_next_cap_ttl(). This function reads PCI configuration space of the VF PCI device, and looks for the required id. So sometimes, it does not find the PCI_CAP_ID_MSIX id. Then this function returns 0. This is the root cause of the errors we are seeing - on different code paths they all call pci_find_capability(dev, PCI_CAP_MSIX).

We tried to sleep for 100ms after such failure, and then check for PCI_CAP_MSIX capability again, and it was reported properly. So it looks like a transient HW issue of not reporting the capability.

Can you please check with your hw/sw engineers what could be the cause of HW not reporting this capability?

FYI, the detailed analysis is here: http://www.spinics.net/lists/linux-pci/msg19014.html http://www.spinics.net/lists/linux-pci/msg19014.html

Thanks!

The code of __pci_find_next_cap_ttl():

static int __pci_find_next_cap_ttl(struct pci_bus *bus, unsigned int devfn,

u8 pos, int cap, int *ttl)

{

u8 id;

while ((*ttl)--) {

pci_bus_read_config_byte(bus, devfn, pos, &pos);

if (pos < 0x40)

break;

pos &= ~3;

pci_bus_read_config_byte(bus, devfn, pos + PCI_CAP_LIST_ID,

&id);

if (id == 0xff)

break;

if (id == cap)

return pos;

pos += PCI_CAP_LIST_NEXT;

}

return 0;

}

0 Kudos
ALyak
Beginner
2,435 Views

Hi Patrick,

some additional reproductions showed two cases of failure to find the MSIX capability:

Case1: HW reports PCI_CAP_ID_SLOTID (0x04) and PCI_CAP_ID_EXP (0x10), but not PCI_CAP_ID_MSIX (0x11).

Case2: HW reports no capabilities at all

A retry (after a short delay) in both cases reports the MSIX capability alright.

Thanks,

Alex.

0 Kudos
Patrick_K_Intel1
Employee
2,435 Views

I've sent your info to our experts and they are investigating. Will let you know what we find.

- Patrick

0 Kudos
Patrick_K_Intel1
Employee
2,435 Views
0 Kudos
ALyak
Beginner
2,435 Views

Hi Patrick,

No I haven't tried to repro with this patch yet. This patch changes passing a callback function instead of NULL to request_threaded_irq() call. However, our problem happens even before we call request_threaded_irq(). We hit an issue when calling pci_enable_msix(), which does not find the required MSIX capability on the VF PCI device. In addition, this patch is against kernel v3.5, while we are running kernel 3.2.

I will try to retest again with this patch applied, though.

Thanks,

Alex.

0 Kudos
Reply