Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Diederick_H_
Beginner
99 Views

Correct way of managing calls to MFXVideoENCODE_EncodeFrameAsync()

Hi

When I call MFXVideoENCODE_EncodeFrameAsync(), what is the recommended way of managing the syncpoint(s) and bitstream(s) variables that you pass into this function? I'm wondering because I've seen different ways how people implement this. One way I've seen is to reuse one bitstream and reuse the syncpoint for every call you make to MFXVideoENCODE_EncodeFrameAsync().  The second solution I've seen is to manage a pool of bitstreams and syncpoints. 

I've heard that the second solution is the preferred way; though it seems like a design flaw to create multiple bitstream objects as one bitstream should be enough to handle the output of the encoder.

Anyone who can share some advise? 

Thanks

0 Kudos
4 Replies
Alexey_F_Intel
Employee
99 Views

Hi Diederick, if you want to achieve highest encoding speed and fully utilize hardware power you shall use asynchronous approach. for this you need multiple sync points and buffers. When you use a single bitstream buffer you will wait result for while on a sync point after EncodeFrame function. Media SDK software layer (EncoderFrame) function submits encoding task to hardware layer and returns immediately. Only completion of sync point gives you confidentiality of readiness encoding task.

By asynchronous pipelining you may submit multiple tasks, not waiting the result but collecting sync points. After waiting a completion latest collected sync point you can safely complete your encoding. You may ignore actually sync points - you may want to keep only the latest but you need unique bitstream buffers for each task.

Regards, Alexey

AaronL
Beginner
99 Views

It really depends on the requirements of your application.  If you are dealing with live video, for example, there is no point to having multiple sync points and tasks, because encoding and the call to SyncOperation() will complete well before the next video frame has arrived (unless you are dealing with a very very fast frame rate).  If you are dealing with video that has already been generated, then you may realize slightly improved performance by having multiple sync points, but at the cost of greater complexity in the code, and the greater complexity may not be worth it.

I think the Media SDK tutorials, as opposed to the samples, are a very good resource for determining how to do this.  The tutorials can be found at http://software.intel.com/sites/default/files/mediasdk-tutorials-0.0.3.zip (oddly, only an older 0.0.2 version of the tutorials is currently available at https://software.intel.com/en-us/articles/media-sdk-tutorials-for-client-and-server ).  I found the simple_2_decode tutorial (which can be easily adapted to encoding) to be the most useful for my purposes.  It uses system memory, and while all the Intel-published documentation indicates to use video memory when using hardware encoding with the Media SDK, in practice, if you don't end up using any aspects of VPP (and possibly even if you do), I've found it is generally faster to use system memory instead of video memory, not to mention resulting in easier to understand and less complex code.

If you want to use an asynchronous pipeline, then the simple_3_encode_vmem_async tutorial is a good starting point.  This tutorial uses video memory and multiple sync points.  I personally think that the use of the term "asynchronous" to describe this sort of pipeline is a bit misleading, since there is nothing really asynchronous about this operation as everything is done on one thread (at least from the perspective of the person using the Media SDK--plenty happens behind the scenes on Media SDK-created threads).  The only aspect that can sort of be called "asynchronous" is the fact that you don't call SyncOperation() until absolutely necessary, but there is nothing preventing you from calling it earlier.  So, its really more of a delayed processing model from my perspective, although it may result in improved performance, since in theory it keeps the Media SDK pipeline saturated with data.

Diederick_H_
Beginner
99 Views

Hi Alexey and AaronL,

Thanks for your great answers! I can confirm that using system memory is indeed very fast. I couldn't get the encoder working when using the following setup when using a hardware encoder:

  1. One sync point, multiple bitstreams
  2. One sync points, one bitstream

The only way I could get the encoder working was when I used multiple syncpoints and multiple bitstreams. In both situations, 1 and 2, I always received MFX_ERR_ABORTED when calling SyncOperation(), for every sync point.  

Thanks, 
D

AaronL
Beginner
99 Views

Without seeing your code, its hard to say for certain, but it sounds to me like you are probably doing something wrong.  There might be helpful information at https://software.intel.com/en-us/forums/intel-media-sdk/topic/475618 , but, if you are really reusing the same sync point (as in same mfxSyncPoint value), that won't work.  I wasn't suggesting that you could use a single mfxSyncPoint handle repeatedly.  Each call to EncodeFrameAsync() needs to be called like the following:

mfxSyncPoint syncp = 0;
mfxStatus 	status = pEncode->EncodeFrameAsync(0, pFrameSurface, &mfxBS, &syncp);
//
// handle different error codes here
//

// Ignore warnings if output is available,
if ((status > MFX_ERR_NONE) && syncp)
    		status = MFX_ERR_NONE;
if (status == MFX_ERR_NONE)
{
    status = videoSession.SyncOperation(syncp, 60000);
	    if (status == MFX_ERR_NONE)
	    {
		        //
        		// do something with encoded frame here
        		//
    	}
}

Depending on your usage model, it may make sense to use a different bitstream (system memory allocation) in mfxBS, reuse the same memory repeatedly, or use a combination of the two techniques (that is, have a set of bitstream memory allocations that you can make use of and that are reused).

Reply