Media (Intel® oneAPI Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools from Intel. This includes Intel® oneAPI Video Processing Library and Intel® Media SDK.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

High CPU Usage with Hardware Decoder

BobbyS
Novice
574 Views

Hello,

I am working on upgrading our application from the Intel Media SDK to the newer oneAPI libraries.  In the app we are consuming incoming h264 streams and displaying with DirectX on Windows platforms.  We are using hardware acceleration to do the decoding.

We have the new One API version rendering now with similar performance, however the CPU usage is significantly higher with the oneAPI libraries.  With the Intel Media SDK the CPU usage for our application is around 1 - 2%, with the oneAPI libraries our CPU usage varies from 24% - 30%.  The GPU appears to be getting similar usage in either library.


Both the calls to MFXVideoDECODE_DecodeFrameAsync and MFXVideoVPP_RunFrameVPPAsync appear to be increasing CPU usage significantly.

The code is rather complex to post in its entirety.

The API library is 1.34, and the DLL being loaded for processing by the oneAPI library is C:\Windows\system32\libmfxhw64.dll

The IOPattern on the mfxVideoParam for decode is set to MFX_IOPATTERN_OUT_VIDEO_MEMORY

For the VPP mfxVideoParam the IOPattern is set to MFX_IOPATTERN_IN_VIDEO_MEMORY | MFX_IOPATTERN_OUT_VIDEO_MEMORY

I have tried to verify all flags are set equivalently, etc.  Any suggestions on anything to check that may be causing the high CPU usage?

Thanks,
Bobby

0 Kudos
1 Solution
BobbyS
Novice
452 Views

Dmitry,

 

Thank you for providing the very thorough example with DirectX rendering, it was very useful to analyze.  We eventually found an issue in our decoding loop that was looping too fast when waiting for additional data to decode.  We added a 1 millisecond sleep when no data was available, and all of the extra CPU utilization went away.

 

Thanks,
Bobby

View solution in original post

7 Replies
RahulU_Intel
Moderator
533 Views

Hi,

 

Thanks for posting in Intel communities. We are checking on this from our side. We will get back to you.

 

Thanks and Regards

Rahul

 

 

Dmitry_E_Intel
Employee
523 Views

Hi @BobbyS ,

 

It's pretty weird. I assume you operate on a legacy platform like SKL, KBL, right?  It's because you said libmfxhw64.dll is loaded. VPL run-time library(libmfx-gen.dll) is workable since TigerLake. On legacy platforms VPL dispatcher loads the legacy MediaSDK run-time library (libmfxhw64.dll). It's a kind of compatibility mode.

So it means that on the same system you have MediaSDK based app and VPL based app, they eventually loads the same libmfxhw64.dll but somehow with VPL based app you observe a high CPU usage. 

Can you please try to reproduce the same issue using mediasdk sample_decode (https://github.com/Intel-Media-SDK/MediaSDK/tree/master/samples/sample_decode and VPL sample_decode(https://github.com/oneapi-src/oneVPL/tree/master/tools/legacy/sample_decode )?  Please also share info about HW platform and driver version. 

 

Regards,

Dmitry

BobbyS
Novice
518 Views

Dmitry,

I am running on an Intel i7 10510U CPU.

The two links you posted are dead.

I have looked over a lot of the samples, however I have not seen any that are actually doing hardware acceleration (they all specify software only).

Do you have a link to hardware based example using Direct X surfaces?

Thanks,

Bobby

Dmitry_E_Intel
Employee
514 Views

Hi Bobby,

 

Fixed links. These two samples above use D3D9/D3D11 surfaces. Please use them with "-d3d11 -hw" options.

For VPL sample_decode please also add "-api2x_dispatcher".

 

Regards,

Dmitry

BobbyS
Novice
453 Views

Dmitry,

 

Thank you for providing the very thorough example with DirectX rendering, it was very useful to analyze.  We eventually found an issue in our decoding loop that was looping too fast when waiting for additional data to decode.  We added a 1 millisecond sleep when no data was available, and all of the extra CPU utilization went away.

 

Thanks,
Bobby

View solution in original post

Dmitry_E_Intel
Employee
444 Views

Great! Thanks for letting know. Just wondering, why did the issue appear within VPL transition? Any VPL related specific behavior from our run-time library or just a programming mistake when you changed application code?

 

Regards,

Dmitry

RahulU_Intel
Moderator
294 Views

Hi,

 

Glad to know that your issue is resolved. This thread will no longer be monitored. If you have any other query you can post a new question in the Intel community.

 

Thanks and Regards

Rahul

 

Reply