Synchronous decode with a single frame allocation

Victor_D_ · ‎09-08-2013

I'm integrating Intel Media SDK decoding into an App that currently performs decode synchronously by passing a compressed frame as input
and expects an uncompressed frame as an output into a provided output buffer.

I have this working using system memory, but can't seem to get it going with graphics memory.
An easy way to see the issue is to modify the latest Intel Media SDK samples, with trivial modification as follows.

In simple_decode sample, after the line that has this statement

sts = mfxDEC.QueryIOSurf(&mfxVideoParams, &Request);

add these two statements

Request.NumFrameMin = 1;
Request.NumFrameSuggested = 1;

The program still works and is now completely synchronous. It also has pretty good performance.

However, when I make the same modification to simple_decode_d3d, the sample no longer works and returns an error.

Synchronous behavior is desirable in this case because the current App works synchronously. It can migrate to asynchronous method,
but this will take additional time and effort.
Is it possible to do decode (H.264) synchronously when using graphics memory? The hardware is working synchronously when using system memory.

Thank you for your help.
-Victor

Victor_D_ · ‎09-08-2013

Also, if I set

mfxVideoParams.AsyncDepth = 1;

in simple_decode.cpp, then after

sts = mfxDEC.QueryIOSurf(&mfxVideoParams, &Request);

Request.NumFramesSuggested is 1

However, if I do that in simple_decode_d3d

Request.NumFramesSuggested is 7

This is a bit puzzling as it seems like the h/w can do synchronous operation in system memory but not in graphics memory.

-Victor

Petter_L_Intel · ‎09-09-2013

Hi Victor,

Regrading surface allocation. You must allocate the # of surfaces indicated by QueryIOSurf. If not, the Media SDK component will no be able to execute the workload. So, I would not recommend allocating less surfaces than indicated by QueryIOSurf.

The reason you see different behavior between sys memory vs. D3D memory use case is due to internal behavior of the SDK. For the sys mem case the frame located on the surface will be copied to one of the internally allocated D3D surfaces before the HW can decode the frame. The default decoder operation (even for AsyncDepth=1) requires several D3D surfaces.

For the D3D case you are allocating the D3D surfaces by yourself, thus seeing a larger # compared to when using system memory.

Please also check out the Media SDK tutorial samples, which does include two decode samples comparing system memory vs. D3D decode usage. http://software.intel.com/en-us/articles/intel-media-sdk-tutorial

Regards,
Petter

Victor_D_ · ‎09-09-2013

Hi Peter,

Thank you for your detailed response. I really appreciate Media SDK's team for choosing ultimate performance in the case of D3D, because that is the most important. It is nice that the system memory configuration gives the user a choice of either synchronous or asynchronous, as that allows for most compatibility, giving up the peak performance for flexibility.

Thanks,

-Victor