Graphics
Intel® graphics drivers and software, compatibility, troubleshooting, performance, and optimization
21079 Discussions

Intel Data Center GPU Flex 140 SR-IOV Virtualization IOMMU Group Problem

z0Kng
Beginner
1,259 Views

Hello,
I am currently setting up a VM with GPU SR-IOV Virtualisation. My setup is:

Ubuntu 22.04.3 LTS
AMD EPYC 7742
G242-Z11-00
Intel Data Centre Flex 140

I followed these instructions: https://github.com/intel/media-delivery/blob/master/doc/virtualization.rst#id8

I was able to successfully create a virtual instance (07:00.1) as you can see here:

$ lspci -nnk | grep -A 3 -i 56c1
07:00.0 Display controller [0380]: Intel Corporation Device [8086:56c1] (rev 05)
Subsystem: Intel Corporation Device [8086:4905]
Kernel driver in use: i915
Kernel modules: intel_vsec, i915
07:00.1 Display controller [0380]: Intel Corporation Device [8086:56c1] (rev 05)
Subsystem: Intel Corporation Device [8086:4905]
Kernel driver in use: vfio-pci
Kernel modules: intel_vsec, i915
--
0a:00.0 Display controller [0380]: Intel Corporation Device [8086:56c1] (rev 05)
Subsystem: Intel Corporation Device [8086:4905]
Kernel driver in use: i915
Kernel modules: intel_vsec, i915

However, the problem is as soon as I pass the virtual instance as described in the instructions using

-device vfio-pci,host=07:00.1

to a VM in QEMU I get the following error message:

vfio 0000:07:00.1: group 0 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.

The IOMMU group of the GPU is as follows:

IOMMU Group 0:
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
01:00.0 PCI bridge [0604]: Broadcom / LSI PEX88048 50 lane, 50 port, PCI Express Gen 4.0 ExpressFabric Platform [1000:c010] (rev b0)
02:00.0 PCI bridge [0604]: Broadcom / LSI PEX88048 50 lane, 50 port, PCI Express Gen 4.0 ExpressFabric Platform [1000:c010] (rev b0)
03:00.0 PCI bridge [0604]: Broadcom / LSI PEX88048 50 lane, 50 port, PCI Express Gen 4.0 ExpressFabric Platform [1000:c010] (rev b0)
04:08.0 PCI bridge [0604]: Broadcom / LSI PEX88048 50 lane, 50 port, PCI Express Gen 4.0 ExpressFabric Platform [1000:c010] (rev b0)
04:18.0 PCI bridge [0604]: Broadcom / LSI PEX88048 50 lane, 50 port, PCI Express Gen 4.0 ExpressFabric Platform [1000:c010] (rev b0)
05:00.0 PCI bridge [0604]: Intel Corporation Device [8086:4fa1] (rev 01)
06:01.0 PCI bridge [0604]: Intel Corporation Device [8086:4fa4]
07:00.0 Display controller [0380]: Intel Corporation Device [8086:56c1] (rev 05)
07:00.1 Display controller [0380]: Intel Corporation Device [8086:56c1] (rev 05)
08:00.0 PCI bridge [0604]: Intel Corporation Device [8086:4fa1] (rev 01)
09:01.0 PCI bridge [0604]: Intel Corporation Device [8086:4fa4]
0a:00.0 Display controller [0380]: Intel Corporation Device [8086:56c1] (rev 05)

I have already tried swapping the PCIE slot, but all the devices mentioned remain togehter in one group. One question now is, should all these devices be in a different IOMMU group, including the real GPU and the PCI bridges? Do the Broadcom / LSI PEX88048 belong to the GPU?

 

I would be very grateful if you could help me to solve this problem.


Best regards

Michael

 

0 Kudos
1 Solution
z0Kng
Beginner
1,201 Views

I found the solution. In the BIOS "PCIe ARI Support" and "ACS Enable" had to be enabled. Now all the devices are in seperate IOMMU Groups and I can pass a virtual instance gpu instance to the vm.

View solution in original post

0 Kudos
1 Reply
z0Kng
Beginner
1,202 Views

I found the solution. In the BIOS "PCIe ARI Support" and "ACS Enable" had to be enabled. Now all the devices are in seperate IOMMU Groups and I can pass a virtual instance gpu instance to the vm.

0 Kudos
Reply