- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am working on upgrading our application from the Intel Media SDK to the newer oneAPI libraries. In the app we are consuming incoming h264 streams and displaying with DirectX on Windows platforms. We are using hardware acceleration to do the decoding.
We have the new One API version rendering now with similar performance, however the CPU usage is significantly higher with the oneAPI libraries. With the Intel Media SDK the CPU usage for our application is around 1 - 2%, with the oneAPI libraries our CPU usage varies from 24% - 30%. The GPU appears to be getting similar usage in either library.
Both the calls to MFXVideoDECODE_DecodeFrameAsync and MFXVideoVPP_RunFrameVPPAsync appear to be increasing CPU usage significantly.
The code is rather complex to post in its entirety.
The API library is 1.34, and the DLL being loaded for processing by the oneAPI library is C:\Windows\system32\libmfxhw64.dll
The IOPattern on the mfxVideoParam for decode is set to MFX_IOPATTERN_OUT_VIDEO_MEMORY
For the VPP mfxVideoParam the IOPattern is set to MFX_IOPATTERN_IN_VIDEO_MEMORY | MFX_IOPATTERN_OUT_VIDEO_MEMORY
I have tried to verify all flags are set equivalently, etc. Any suggestions on anything to check that may be causing the high CPU usage?
Thanks,
Bobby
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dmitry,
Thank you for providing the very thorough example with DirectX rendering, it was very useful to analyze. We eventually found an issue in our decoding loop that was looping too fast when waiting for additional data to decode. We added a 1 millisecond sleep when no data was available, and all of the extra CPU utilization went away.
Thanks,
Bobby
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel communities. We are checking on this from our side. We will get back to you.
Thanks and Regards
Rahul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @BobbyS ,
It's pretty weird. I assume you operate on a legacy platform like SKL, KBL, right? It's because you said libmfxhw64.dll is loaded. VPL run-time library(libmfx-gen.dll) is workable since TigerLake. On legacy platforms VPL dispatcher loads the legacy MediaSDK run-time library (libmfxhw64.dll). It's a kind of compatibility mode.
So it means that on the same system you have MediaSDK based app and VPL based app, they eventually loads the same libmfxhw64.dll but somehow with VPL based app you observe a high CPU usage.
Can you please try to reproduce the same issue using mediasdk sample_decode (https://github.com/Intel-Media-SDK/MediaSDK/tree/master/samples/sample_decode and VPL sample_decode(https://github.com/oneapi-src/oneVPL/tree/master/tools/legacy/sample_decode )? Please also share info about HW platform and driver version.
Regards,
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dmitry,
I am running on an Intel i7 10510U CPU.
The two links you posted are dead.
I have looked over a lot of the samples, however I have not seen any that are actually doing hardware acceleration (they all specify software only).
Do you have a link to hardware based example using Direct X surfaces?
Thanks,
Bobby
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Bobby,
Fixed links. These two samples above use D3D9/D3D11 surfaces. Please use them with "-d3d11 -hw" options.
For VPL sample_decode please also add "-api2x_dispatcher".
Regards,
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dmitry,
Thank you for providing the very thorough example with DirectX rendering, it was very useful to analyze. We eventually found an issue in our decoding loop that was looping too fast when waiting for additional data to decode. We added a 1 millisecond sleep when no data was available, and all of the extra CPU utilization went away.
Thanks,
Bobby
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Great! Thanks for letting know. Just wondering, why did the issue appear within VPL transition? Any VPL related specific behavior from our run-time library or just a programming mistake when you changed application code?
Regards,
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Glad to know that your issue is resolved. This thread will no longer be monitored. If you have any other query you can post a new question in the Intel community.
Thanks and Regards
Rahul

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page