Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

High CPU Usage with Hardware Decoder

BobbyS
Novice
2,192 Views

Hello,

I am working on upgrading our application from the Intel Media SDK to the newer oneAPI libraries.  In the app we are consuming incoming h264 streams and displaying with DirectX on Windows platforms.  We are using hardware acceleration to do the decoding.

We have the new One API version rendering now with similar performance, however the CPU usage is significantly higher with the oneAPI libraries.  With the Intel Media SDK the CPU usage for our application is around 1 - 2%, with the oneAPI libraries our CPU usage varies from 24% - 30%.  The GPU appears to be getting similar usage in either library.


Both the calls to MFXVideoDECODE_DecodeFrameAsync and MFXVideoVPP_RunFrameVPPAsync appear to be increasing CPU usage significantly.

The code is rather complex to post in its entirety.

The API library is 1.34, and the DLL being loaded for processing by the oneAPI library is C:\Windows\system32\libmfxhw64.dll

The IOPattern on the mfxVideoParam for decode is set to MFX_IOPATTERN_OUT_VIDEO_MEMORY

For the VPP mfxVideoParam the IOPattern is set to MFX_IOPATTERN_IN_VIDEO_MEMORY | MFX_IOPATTERN_OUT_VIDEO_MEMORY

I have tried to verify all flags are set equivalently, etc.  Any suggestions on anything to check that may be causing the high CPU usage?

Thanks,
Bobby

0 Kudos
1 Solution
BobbyS
Novice
2,070 Views

Dmitry,

 

Thank you for providing the very thorough example with DirectX rendering, it was very useful to analyze.  We eventually found an issue in our decoding loop that was looping too fast when waiting for additional data to decode.  We added a 1 millisecond sleep when no data was available, and all of the extra CPU utilization went away.

 

Thanks,
Bobby

View solution in original post

0 Kudos
7 Replies
RahulU_Intel
Moderator
2,151 Views

Hi,

 

Thanks for posting in Intel communities. We are checking on this from our side. We will get back to you.

 

Thanks and Regards

Rahul

 

 

0 Kudos
Dmitry_E_Intel
Employee
2,141 Views

Hi @BobbyS ,

 

It's pretty weird. I assume you operate on a legacy platform like SKL, KBL, right?  It's because you said libmfxhw64.dll is loaded. VPL run-time library(libmfx-gen.dll) is workable since TigerLake. On legacy platforms VPL dispatcher loads the legacy MediaSDK run-time library (libmfxhw64.dll). It's a kind of compatibility mode.

So it means that on the same system you have MediaSDK based app and VPL based app, they eventually loads the same libmfxhw64.dll but somehow with VPL based app you observe a high CPU usage. 

Can you please try to reproduce the same issue using mediasdk sample_decode (https://github.com/Intel-Media-SDK/MediaSDK/tree/master/samples/sample_decode and VPL sample_decode(https://github.com/oneapi-src/oneVPL/tree/master/tools/legacy/sample_decode )?  Please also share info about HW platform and driver version. 

 

Regards,

Dmitry

0 Kudos
BobbyS
Novice
2,136 Views

Dmitry,

I am running on an Intel i7 10510U CPU.

The two links you posted are dead.

I have looked over a lot of the samples, however I have not seen any that are actually doing hardware acceleration (they all specify software only).

Do you have a link to hardware based example using Direct X surfaces?

Thanks,

Bobby

0 Kudos
Dmitry_E_Intel
Employee
2,132 Views

Hi Bobby,

 

Fixed links. These two samples above use D3D9/D3D11 surfaces. Please use them with "-d3d11 -hw" options.

For VPL sample_decode please also add "-api2x_dispatcher".

 

Regards,

Dmitry

0 Kudos
BobbyS
Novice
2,071 Views

Dmitry,

 

Thank you for providing the very thorough example with DirectX rendering, it was very useful to analyze.  We eventually found an issue in our decoding loop that was looping too fast when waiting for additional data to decode.  We added a 1 millisecond sleep when no data was available, and all of the extra CPU utilization went away.

 

Thanks,
Bobby

0 Kudos
Dmitry_E_Intel
Employee
2,062 Views

Great! Thanks for letting know. Just wondering, why did the issue appear within VPL transition? Any VPL related specific behavior from our run-time library or just a programming mistake when you changed application code?

 

Regards,

Dmitry

0 Kudos
RahulU_Intel
Moderator
1,912 Views

Hi,

 

Glad to know that your issue is resolved. This thread will no longer be monitored. If you have any other query you can post a new question in the Intel community.

 

Thanks and Regards

Rahul

 

0 Kudos
Reply