Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

Significantly reduced performance when using 2 threads for two streams instead of 1 thread for 1 stream

sam_p_
Beginner
264 Views

Processor Type: 4790S
Driver Version: What driver?
Operating System: Ubuntu 12.04
Media SDK System Analyzer: What is this?
Quick Reproducer Code: 
Concise Description of the Issue: I get less than 45 fps per stream when doing 2 streams but over 120 on a single stream. 
Priority: Medium
Input File:
Tracer log(if required):

When I do the following

MFXCloneSession(session,&session2);
//Largely dupicated setup of session2
MFXJoinSession(session,session2);

and run both sessions I get at most 40 frames per second, but if i don't run "session2" I get 190 fps, Obviously 40*2 is 80 not 190, what could cause this non-linear performance decrease?

0 Kudos
2 Replies
sam_p_
Beginner
264 Views

Asked another way, what initialization do i have to do to a session I plan on joining?  what does "After joining, the two sessions share thread and resource scheduling for asynchronous operations. "  do for me?

0 Kudos
Nina_K_Intel
Employee
264 Views

Hi Sam,

Sorry for late reply. When sessions are joined they have common tasks scheduler and thread pool. As a consequence it allows to pass frame surfaces between components residing in different sessions without explicit synchronization (i.e. without SyncOperation which waits for task to complete) and thus build asynchronous pipeline of functions residing in different sessions (asynchronous execution is better for performace). It is especially useful when you want to build complex pipelines like 1 decode -> 2 vpp + encode etc.

Additionally, having one thread pool allows to reduce memory usage slightly. Also with SW processing joined sessions allow to avoid thread oversubscription and improve performance. But with HW accelerated processing joining sessions does not have any performance impact. 

So, looking at your result, I agree that it is pretty unexpected. Could you please give us more details like what are these sessions doing - which input/output codecs and resolutions, what's the topology - is it simple (one in -> one out transcoding) x 2 or something different?  If these sessions use same frame surface pool - are you allocating enough surfaces to let both sessions operate without waiting for a surface to free up? 

Probably the best thing to start with is to run sample_multi_transcode app from our samples package and see if it gives the same strange results. It can run 1 or several sessions of transcoding. It also has an option to join the sessions. 

Regards,

Nina

0 Kudos
Reply