question on simple_3_encode_vmem_async

MyMother · ‎12-06-2015

hi Intel-giant,

OS: Ubuntu 12.04

mediasdk-tutorials-0.0.3

MediaServerStudioEssentials2015R6

Platform: i5-4570S

I got questions as reading simple_encode_vmem_async.cpp

Q1. Is this parameter (AsyncDepth) platform-dependent ?? 2 cores or 4 cores or else ?? Or did you mean it's better for all platform?? May I know how did you proof to result in good performance ??

// - AsyncDepth represents the number of tasks that can be submitted, before synchronizing is required
// - The choice of AsyncDepth = 4 is quite arbitrary but has proven to result in good performance
mfxEncParams.AsyncDepth = 4;

Q2. according to the following, why the suggested frame number will be changed??

// Query number of required surfaces for encoder
mfxFrameAllocRequest EncRequest;
...
sts = mfxENC.QueryIOSurf(&mfxEncParams, &EncRequest);
...

EncRequest.NumFrameSuggested = EncRequest.NumFrameSuggested + mfxEncParams.AsyncDepth; ==> why AsyncDepth was added ??

Thanks in advance... QQa

Bjoern_B_Intel · ‎12-07-2015

Hi,

In media SDK samples, sample_multi_transcode can be used for single pipeline transcoding or multiple transcoding pipelines. In each Media SDK transcoding pipeline, there are at least one decoding and encoding session. More complicated transcoding can also contain several sessions of video pre and post processing (VPP) between decoding and encoding. In media SDK, there are a few APIs that can control pipeline performance, such as AsyncDepth and join parameters that can be used to optimize the performance in the pipelines.

The asynchronous operation allows Media SDK to process multiple tasks without syncing so more tasks can be processed in parallel before syncing for next free resources. AsyncDepth is the parameter that controls the level of asynchronous operation before syncing. With larger AsyncDepth, Media SDK would allocate more surfaces to support asynchronous processing of multiple tasks. Larger AsyncDepth allow more tasks to be processed in parallel, though it can also shortage the resources that are available for further processing.

A1: The AsyncDepth value is platform and workload depended but turned out to be quite sufficient for good resource utilization when set to four (can vary between 2 and 5). You might want to check in your personal environment to further fine tune this number.

A2: You need to add AsyncDepth here to the number of suggested frames to allocate enough surfaces running in the pipeline and the amount of hardware resources needed. This ensures encoding at best performance.

Best,

Bjoern Bruecher

MyMother · ‎12-07-2015

hi Bjoern,

thanks for your reply. I still have questions.

A2: You need to add AsyncDepth here to the number of suggested frames to allocate enough surfaces running in the pipeline and the amount of hardware resources needed.

Q1. How do I know when I have insufficient surfaces ??

Q2. How do I know the surfaces are enough without too much to avoid wasting ??

Bjoern_B_Intel · ‎12-08-2015

Hi,

A1: For every incremental depth you also need to have an additional surface. If you follow the tutorial sample simple_3_encode_vmem_async or any other async sample you will not have insufficient number of surfaces. Take the algorithms in the tutorial here.

A2: The async feature is for performance optimization. The driver will handle the hardware resource allocation for you. At some point (AsyncDepth > 5) you will not see any additional performance gain. This is an indicator for AsyncDepth being just right. If you go beyond, you might see performance inversion.

Best Regards,

Bjoern Bruecher