Graphics
Intel® graphics drivers and software, compatibility, troubleshooting, performance, and optimization
20743 Discussions

DG1 (Xe and Xe MAX)'s PCIe extended capabilities don't comply with their hardware specification

Kybd
Beginner
5,354 Views

Recently I got my hands on a DG1 Xe (from ASUSTeK) and a DG1 Xe MAX (from GUNNIR) dGPU. According to this post ( https://community.intel.com/t5/Graphics/DG1-QuickSync-and-SR-IOV/td-p/1353656 ) , the DG1 should have support for SR-IOV. However, lspci -v didn't show corresponding capability; Furthermore, when I try lspci -xxxx on them, I found that their PCIe extended configuration space data don't comply with their hardware specification (Programmer's Reference Manual) from https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/1-0/hardware-specs.html .

 

For example, the first four bytes read from address 100h are 18 00 01 00, whose binary format is 000000000000_0001_0000000000011000b (in little-endian order) : 

Kybd_0-1681045797612.png

 

According to the PRM, this is the ARI Extended Capability Header, with the first 20 bits pointing to 0x200 and last 16 bits being the Capability ID (0000000000001110b), neither of which matches the data read from the card.

Kybd_1-1681046009730.png

 

I tried this both under Linux (kernel 6.1.15) using lspci (3.7.0) and Windows 10 (21H2) using RWEverything (1.7), and they gave the same result, so I believe that this is not related to OS or driver issues. (By the way, the i915 driver in Linux kernel seems no support for SR-IOV for DG1). Also I repeated this proccedure on a laptop with an i5-11320H built into it. The data from it's iGPU matches perfectly well with the PRM, so I guess the lspci and RWEverything tools are reliable.

 

What might be the cause of this problem? Do these dGPUs have some special fuse or mask preventing the system from reading PCIe extended capabilities of them? How can I utilize SR-IOV on these GPU?

 

Any information or help will be much appreciated.

 

Hardware configurations of my setup:

CPU: Xeon W-2155

MB: Lenovo P520c workstation MB (C422 chipset, Above 4G Decoding and SR-IOV enabled, CSM disabled in BIOS)

RAM: 64G DDR4 2400 ECC RDIMM

Disk: Kioxia 1TB NVMe SSD

 

The attachments are PCIe config space data read from DG1 Xe, DG1 Xe MAX.

0 Kudos
1 Solution
Hugo_Intel
Employee
4,820 Views

Hello Kybd


I appreciate your patience. It has been confirmed by our team that these GPUs do not support SR-IOV. Based on page 11 of the document I shared in my previous posts, the information is to point out the lack of this feature:


The MMIO address range from 0x178000 thru 0x178FFF is reserved for communication between a VMM

and the GPU Driver executing on a Virtual Machine.

HW does not actually implement anything within this range. Instead, in a SW Virtualized environment, if

a VM driver issues a read to this MMIO address range, the VMM will trap that access, and provide

whatever data it wishes to pass to the VM driver. In a non-SW-Virtualizated environment (including an

SR-IOV Virtualized environment), reads will return zeros, like any other unimplemented MMIO address. Writes to this range are always ignored.

It is important that no "real" HW MMIO register be defined within this range, as it would be inaccessible in a SW-virtualized environment.


We apologize for any confusion this might have caused.


Best Regards,


Hugo O.

Intel Customer Support Technician.


View solution in original post

0 Kudos
13 Replies
Hugo_Intel
Employee
5,294 Views

Hello Kybd


Thank you for posting on the Intel Communities. I am sorry you are experiencing issues when trying to use SR-IOV on your system with Intel® Iris® Xe MAX Graphics.


In order to further look into this issue, please share with us the following information:



Best Regards,


Hugo O.

Intel Customer Support Technician.


0 Kudos
Kybd
Beginner
5,248 Views

Hello Hugo,

 

Thank you for your reply! The SSU log and System Report are in the attachments. I just found that there are so many Intel parts in my PC: CPU, GPU, NIC, SSD, etc.  😄

 

Xe and Xe MAX seem conflicting with each other under Windows. Once Windows installed driver for Xe MAX, the Xe is 'kicked' and get an error code 31 in the device manager. I have to remove Xe MAX card and reinstall the driver. Maybe I'll write a seperate post for that and focus on the SR-IOV problem for now.

0 Kudos
Hugo_Intel
Employee
5,260 Views

Hello Kybd

  

I hope you are doing fine. 

  

Were you able to check the previous post? 

Let us know if you still need assistance. 

  

Best regards,  


Hugo O.  

Intel Customer Support Technician. 


0 Kudos
Hugo_Intel
Employee
5,179 Views

Hello Kybd


Thank you for sharing the log files. I will check this information with our team, I will back to you as soon as I have more information.


Best Regards,


Hugo O.

Intel Customer Support Technician.


0 Kudos
Hugo_Intel
Employee
5,052 Views

Hello Kybd

 

Thank you for the information. I have checked your inquiry with our team, in this case, allow me to let you know that the DG1 GPUs do not support SR-IOV. You can reference the attached document that contains more information for reference on page 11.

 

Best Regards,

 

Hugo O.

Intel Customer Support Technician.

 

0 Kudos
Kybd
Beginner
5,034 Views

Hello Hugo,

 

Thank you for the help.

 

Page 11 says:



HW does not actually implement anything within this range. Instead, in a SW Virtualized environment, if a VM driver issues a read to this MMIO address range, the VMM will trap that access, and provide whatever data it wishes to pass to the VM driver. In a non-SW-Virtualizated environment (including an SR-IOV Virtualized environment), reads will return zeros, like any other unimplemented MMIO address. Writes to this range are always ignored.


That's confusing. To my understanding, this actually implies that DG1 do support SR-IOV, otherwise there's no need to consider a 'non-SW-Virtualizated environment (including an SR-IOV Virtualized environment)' scenario. Furthermore, there are SR-IOV related registers listed in the DG1's Programmer's Reference Manual (in the attachment):

Kybd_0-1682417671246.png

 

Back to the original post, since the reading from DG1 does not match the PRM, statements from the latter are kind of questionable to me as for now. 

 

0 Kudos
Hugo_Intel
Employee
4,991 Views

Hello Kybd


Allow me to check the document you provided with our team. I will get back to you once I have more information.


Best Regards,


Hugo O.

Intel Customer Support Technician.


0 Kudos
slide
Novice
4,962 Views

Hi, sorry about a little diversion,  but I am curious how you got VA-API to work on DG1.  I followed instructions on dgpu-docs.intel.com , yet, I couldn't get hardware codec working under Linux for the life of me. I would be happy, if you could give me any hint.

0 Kudos
Kybd
Beginner
3,929 Views

Hi slide,

 

Hopefully it's not too late to reply, but I finally got some time to look into the codec of DG1. Good news: QSV codecs partly work with ffmpeg; bad news: VA-API seems broken.

 

It took me a while to figure out what exactly is necessary to tweak to make the codecs work, since the Intel's dGPU doc just stuff everything in a single apt-get command, and some are available in the Linux distro's repo, others are in Intel's repo, and they don't work out of the box. To make things worse, the i915 driver in the newer mainline kernel might be a little different with Intel's own backports/DKMS version.

 

Long story short, here's my environment:

 

Kernel: 6.1.15

Distro: Debian 11 (PVE 7.4 to be exact)

Boot option: i915.enable_guc=7 i915.force_probe=4908 (mighe not need enable_guc)

libva2: 2.15.0 (installed from apt; I chose the wrong repo at first so it's not the latest)

ffmpeg: 4.3.6 (installed from apt)

 

Also I had to manually build and install these packages:

 

intel-gmmlib: 22.3.9 

intel-media-driver: 22.6.0 , patched with PR #1500  (according to oneVPL-intel-gpu/issues/243 ), and with ENABLE_PRODUCTION_KMD=ON option set

intel-media-sdk: 22.6.0 , though I'm not 100% sure if it's strictly necessary

 

I succeeded to decode a 4K 60FPS clip (main, yuv420p) with h264_qsv (the HW decoder), and encode with hevc_qsv in constant quality mode:

 

ffmpeg \

    -hwaccel qsv -init_hw_device qsv=hw -filter_hw_device hw \
    -c:v h264_qsv -i "input.mp4" \

    -vf hwupload=extra_hw_frames=64,format=qsv -global_quality 28 -look_ahead 1 -c:a copy \
    -c:v hevc_qsv "output.mp4"

 

However h264_qsv failed to decode another 4K 120FPS clip (also yuv420p). SW decode and HW encode works (-hwaccel qsv doesn't use HW decoder):

 

ffmpeg -hwaccel qsv \
    -i "input.mp4" -global_quality 28 -look_ahead 1 -c:a copy \
    -c:v hevc_qsv "output.mp4"

 

HEVC HW decoder might not work according to media-driver/issues/1415 . They tested far more different combination of SW stack versions and options than me. I didn't try HEVC decoder.

 

Neither -hwaccel vaapi nor hevc_vaapi/h264_vaapi worked. I tried multiple combination of options seen in ffmpeg wiki/VAAPI , but always got an output with massive pure-green blocks. It could just be that I didn't set the pixel format correct.

 

Hope this could help.

 

Kybd

 

0 Kudos
slide
Novice
3,879 Views
0 Kudos
Kybd
Beginner
4,944 Views

Hi slide, sorry but actually I haven't played with hardware codec yet, so currently I couldn't provide any information on that.

0 Kudos
Hugo_Intel
Employee
4,821 Views

Hello Kybd


I appreciate your patience. It has been confirmed by our team that these GPUs do not support SR-IOV. Based on page 11 of the document I shared in my previous posts, the information is to point out the lack of this feature:


The MMIO address range from 0x178000 thru 0x178FFF is reserved for communication between a VMM

and the GPU Driver executing on a Virtual Machine.

HW does not actually implement anything within this range. Instead, in a SW Virtualized environment, if

a VM driver issues a read to this MMIO address range, the VMM will trap that access, and provide

whatever data it wishes to pass to the VM driver. In a non-SW-Virtualizated environment (including an

SR-IOV Virtualized environment), reads will return zeros, like any other unimplemented MMIO address. Writes to this range are always ignored.

It is important that no "real" HW MMIO register be defined within this range, as it would be inaccessible in a SW-virtualized environment.


We apologize for any confusion this might have caused.


Best Regards,


Hugo O.

Intel Customer Support Technician.


0 Kudos
Hugo_Intel
Employee
4,744 Views

Hello Kybd


I see there are no further questions regarding this issue, we will proceed to close this thread. Feel free to open a new topic if you need further assistance.


Best Regards,


Hugo O.

Intel Customer Support Technician.


0 Kudos
Reply