Community
cancel
Showing results for 
Search instead for 
Did you mean: 
constantin_g_
Beginner
39 Views

Bad Performance intel mediaSDK sample_decoding

Hello, I'm interested in low-latency-decoding,so I downloaded the samples for visual studio to test intel hw decoding acceleration. As long as i don't specify any output , the sample_decode runs fine at ~15ms decoding latency (-low_latency , -async 1 and -calc_latency in command line for a 720p raw h.264 stream.) But when i ether render the stream onscreen or specify a d3d or d3d11 surface as output,the performance drops to max. 50fps and high lag at ~14% cpu usage. Is the example already best optimized for rendering output frames,or does it maybe use system memory ? What's the best way to "get the decoded frames with low lag onscreen" ? Thanks !
0 Kudos
2 Replies
Jeffrey_M_Intel1
Employee
39 Views

The sample_decode example has a decode-only mode which can help estimate the performance of decode alone.  However, as you've noticed, render and decode to file are there to show that decode works but are not optimized.

You can see a faster decode->render example with ocl_media_sdk_interop (in the samples package) or in the multi-media samples https://software.intel.com/sites/landingpage/mmsf/documentation/windows_main.html

A simpler place to start for low latency decode is the tutorials (simple_6_transcode_opaque_lowlatency).  However, in the past rendering has been considered out of scope of Media SDK.  After decode the surface can be rendered using the API it was allocated with (directx9/11 for Windows, VAAPI for Linux). 

As far as I know there is no simple fast/low latency decode-render example at the Media SDK level.  However, there may be other options.  Let me see what I can find for you. 

constantin_g_
Beginner
39 Views

Hello, thank you for your fast response. Unfortunately the mediasdk_tutorials don't compile plug'n play in Visual Studio 2015, because of c++ compiler updates; So it took me some time to find the cause of the latency increase inside the sample code. With one of the options (-d3d, -r) mentioned before ,the number of created surfaces is 6 (even though async=1), and the more frames the decoder buffers,the more lag it creates. Does d3d or d3d11 by default need that many surfaces,or can I restrict the number somehow ? On Android i experienced a very good performance with OpenG, MediaCodec and GL_OES_EGL_image_external . Is there any similar sharing mechanism for texture data between intel decoder and OpenGl ?
Reply