Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.
3086 Discussions

GPU hangs when decoding 2 HEVC UHD streams 444 10 bits (Y410 pixel format).

njean
Beginner
5,638 Views

In Linux, when we decode 2 streams HEVC UHD 444 10 bits on an I5 Alder Lake (also reproduce on a I7 Tiger Lake), we have a GPU hangs.

 

We reproduce the bug with the sample app, with the following command: 
sample_decode h265 -i ./test-444-10.h265 -hw -vaapi -o /dev/null & sample_decode h265 -i ./test-444-10.h265 -hw -vaapi -o /dev/null

 

This bug is a GPU hang in Intel Gfx stack.

i915 kernel messages:

kernel: [26204.741232] i915 0000:00:02.0: [drm] Resetting vcs1 for preemption time out
kernel: [26204.741258] i915 0000:00:02.0: [drm] sample_decode[36374] context reset due to GPU hang
kernel: [26204.775905] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:10:28fffffd, in sample_decode [36374]
kernel: [26213.573237] i915 0000:00:02.0: [drm] Resetting vcs1 for preemption time out
kernel: [26213.573256] i915 0000:00:02.0: [drm] sample_decode[36373] context reset due to GPU hang
kernel: [26213.613847] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:10:28fffffd, in sample_decode [36373]

 


Version: OneVPL GPU Runtime 2022Q3 release - 22.5.4. (Released on Oct 14):

oneVPL GPU Runtime: https://github.com/oneapi-src/oneVPL-intel-gpu/releases/tag/intel-onevpl-22.5.4
oneVPL Dispatcher and Samples: https://github.com/oneapi-src/oneVPL/releases/tag/v2022.2.2
Driver: https://github.com/intel/media-driver/releases/tag/intel-media-22.5.4
Gmmlib: https://github.com/intel/gmmlib/releases/tag/intel-gmmlib-22.2.0
libva: https://github.com/intel/libva/releases/tag/2.16.0
libva-utils: https://github.com/intel/libva-utils/releases/tag/2.16.0

 

Ubuntu 22.04 LTS
Kernel:
- I5 alder lake : 5.18.0-051800rc1-generic
Type: N/A Mobo: Intel model: PELM12HBI516 v: M47315-301 serial: BTHB2120092B UEFI: Intel v: HBADL357.0038.2022.0310.0956 date: 03/10/2022
CPU: 10-core (2-mt/8-st) 12th Gen Intel Core i5-1235U (-MST AMCP-)
speed/min/max: 614/400/4400:3300 MHz Kernel: 5.18.0-051800-generic x86_64 Up: 6d 20h 41m
Mem: 748.1/15577.8 MiB (4.8%) Storage: 238.47 GiB (12.0% used) Procs: 233 Shell: Bash

- I7 tiger lake : 5.15.0-46 generic
Type: Desktop Mobo: ASRock model: NUC-TGL serial: M8P-DC000400058
    UEFI: American Megatrends LLC. v: P1.10 date: 12/24/2020
CPU: quad core 11th Gen Intel Core i7-1165G7 (-MT MCP-) speed/min/max: 775/400/2701 MHz
Kernel: 5.15.0-46-generic x86_64 Up: 18h 34m Mem: 2971.0/15332.4 MiB (19.4%)
Storage: 232.89 GiB (22.4% used) Procs: 210 Shell: Bash

 

 

 

0 Kudos
19 Replies
RemyaP_Intel
Employee
5,605 Views

Hi,


Thank you for posting in Intel Communities,


Thanks for sharing the details with us. We are trying to reproduce the issue. Could you please also share the input file used?


Regards,

Remya Premdas


0 Kudos
njean
Beginner
5,594 Views

Hi,

 

Here is the input file used.

 

Thanks,

 

Nicolas

0 Kudos
RemyaP_Intel
Employee
5,535 Views

Hi,


Thanks for sharing the input file. We are trying to reproduce your issue. We'll get back soon with an update.


Regards,

Remya Premdas


0 Kudos
RemyaP_Intel
Employee
5,445 Views

Hi,


We are still working on your issue. Sorry for the delay.


Regards,

Remya Premdas


0 Kudos
RemyaP_Intel
Employee
5,412 Views

Hi,


We tried the same sample decode command with the input file present in the oneVPL repo at /examples/content/cars_320x240.h265 It ran without any hang or errors.


We are checking the same with the input file you have shared. Meanwhile could you please try running on your machine with the cars_320x240.h265 input file and see if there is any GPU hang or errors?


Regards,

Remya Premdas



0 Kudos
njean
Beginner
5,402 Views

Hello Remya,

 

/examples/content/cars_320x240.h265 runs without problem because it is a 444 8 bits. 

The bug is with 444 10 bits.

 

Thanks,

 

Nicolas

0 Kudos
RemyaP_Intel
Employee
5,346 Views

Hi,


Sorry for the delay. Our team is working on this issue internally and will get back to you soon with an update.


Regards,

Remya Premdas


0 Kudos
njean
Beginner
5,139 Views

Hello Remya, do you have any update on this issue? Have you been able to reproduce it on your side?

0 Kudos
RemyaP_Intel
Employee
5,114 Views

Hi,


Sorry for the delay. After the analysis by our development team, they have confirmed the issue is with the driver and not VPL. They are working on fixing it and currently we do not have an ETA for this.


Regards,

Remya Premdas


0 Kudos
njean
Beginner
5,097 Views

Hello Remya,

 

We went back to the issue and retested with the last oneVPL release (v2023.1.0):

oneVPL GPU Runtime: https://github.com/oneapi-src/oneVPL-intel-gpu/releases/tag/intel-onevpl-22.6.5

oneVPL Dispatcher and Samples: https://github.com/oneapi-src/oneVPL/releases/tag/v2023.1.0

Driver: https://vpg-src1:8443/projects/MVX3/repos/intel-media-driver/browse?at=refs%2Fheads%2Fintel-media-Matrox-22.6.6

Gmmlib: https://github.com/intel/gmmlib/releases/tag/intel-gmmlib-22.3.3

libva: https://github.com/intel/libva/releases/tag/2.17.0

libva-utils: https://github.com/intel/libva-utils/releases/tag/2.17.1

We also have used the latest kernel available (6.2.0-060200-generic_6.2.0).

The GPU hang is still happening.

We also reproduced the problem with the 444-8 bits streams (AYUV). 

And we have observed that the issue is much easier to reproduce with streams with B frames; without B frames, it is more difficult to reproduce but we observe errors happening in the i915 driver.

0 Kudos
RemyaP_Intel
Employee
5,086 Views

Hi,


Thanks for sharing the observations with us. We will share this with our team. As said earlier, currently we do not have an ETA for this fix. We are following up on this issue with our internal team. We will let you know, if there are any updates.


Regards,

Remya Premdas


0 Kudos
RemyaP_Intel
Employee
4,887 Views

Hi,


Thanks for your patience. This is to inform you that we found the hang does not occur on Ubuntu 20.04, and that we are working on fixing the hang that occurs with this example on Ubuntu 22.04. We will let you know when there is any update.


Regards,

Remya Premdas


0 Kudos
njean
Beginner
4,709 Views

Hello Remya,

 

I'm a bit surprised regarding your observation with the Ubuntu version. In our side, I'm pretty sure the first we had the problem we were using Ubuntu 20.04. It looks more related to the intel driver i915 instead of the Ubuntu version.

 

Recently, we have retry the tests and we reproduced it with the following:

Processor: RL: i5-1335U
The error in the linux dmesg file still points to a GPU hang in Intel Gfx stack i915 driver:
i915 kernel messages:
kernel: [26204.741232] i915 0000:00:02.0: [drm] Resetting vcs1 for preemption time out
kernel: [26204.741258] i915 0000:00:02.0: [drm] sample_decode[36374] context reset due to GPU hang
kernel: [26204.775905] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:10:28fffffd, in sample_decode [36374]
kernel: [26213.573237] i915 0000:00:02.0: [drm] Resetting vcs1 for preemption time out
kernel: [26213.573256] i915 0000:00:02.0: [drm] sample_decode[36373] context reset due to GPU hang
kernel: [26213.613847] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:10:28fffffd, in sample_decode [36373]

0 Kudos
RemyaP_Intel
Employee
4,663 Views

Hi,

 

Apologies for the delay in getting back.

 

Though we initially reported that Ubuntu 20.04 is good, but that the hang occurs on 22.04 and we are working on that, it turns out that with all the variables at play, we have not been able to determine why we cannot reproduce on 20.04.

 

We do know that there is a root cause in our hardware implementation on TGL and ADL-S. We discovered the issue and made a change in the hardware functionality to avoid the hang. However, TGL and ADL-S do not support this new functionality. (See below for how to execute the 2-stream command on TGL and ADL-S.)

The new functionality is supported on ADL-P. 

 

So, in summary:

  • You should be able to run your command on ADL-P without a hang. (Please let us know if you do get a hang in the 2-stream case. As stated, there are many variables at play in and between the OS and the platform.)
  • You can run your command on TGL and ADL-S with modifications:
    • the hang is often triggered when scalability and MMC are enabled. If you want to decode 2 streams simultaneously, first disable scalability and MMC. (Later platforms will support our new functionality that was implemented to avoid this hang. So no need in our later platforms to disable these features.)

 

Regards,

Remya Premdas

 

0 Kudos
njean
Beginner
4,597 Views

Thanks Remya for your answer.

 

This is really great that you have a workaround for us. We would like to try it in our side, but we would need more precision about the scalability and the MMC feature you're talking about. 

 

Would you have more details on those features and how we should disable them?

 

Regards,

 

Nicolas

0 Kudos
RemyaP_Intel
Employee
4,487 Views

Hi,

 

Please follow the below steps to disable scalability and MMC:

 

1. got to /etc/

2. modify igfx_user_feature.txt and igfx_user_feature_next.txt,

 

igfx_user_feature.txt:

add below key under [key]

 

[KEY]

  0x00000001

  UFKEY_INTERNAL\LibVa

  .......

    [VALUE]

      Enable VP MMC

      4

      0

    [VALUE]

      Enable Codec MMC

      4

      0

    [VALUE]

      Enable Vebox Decompress

      4

      0

    [VALUE]

      Enable Media RenderEngine MMC

      4

      0

    [VALUE]

      Enable HCP Scalability Decode

      4

      0

 

igfx_user_feature_next.txt:

under [config]

Enable HCP Scalability Decode=0

Enable VP MMC=0

Enable Codec MMC=0

Enable Media RenderEngine MMC=0

Enable HCP Scalability Decode=0

 

3.How to double check if the key was set successfully?

 

You can check [report] in igfx_user_feature.txt, find below key to be "0"

Decode MMC In Use=0x0

 

Let us know if you face any issues.

 

--

Thanks,

Remya Premdas

 

0 Kudos
njean
Beginner
4,400 Views

Hello Remya,

We have tried your modification to the configuration file and we aren`t able to reproduce the hang. This is really, really, really great! 

 

I don't know what those feature are and what is the impact to disable them. Do you have some details or a link that explains how that works?

 

Thanks you very much!

0 Kudos
RemyaP_Intel
Employee
4,250 Views

Hi ,


Glad to know that your issue was resolved. Unfortunately, we do not have documentation related to the scalability and MMC feature. 


If this workaround has helped you, kindly make sure to accept this as a solution. This would help others with similar issues. 


Also, Let me know if I can go ahead and close this thread.


--

Regards,

Remya Premdas


0 Kudos
RemyaP_Intel
Employee
4,173 Views

Hi,


As confirmed, we are closing this thread. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Remya Premdas


0 Kudos
Reply