Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

Encode and SyncOperation sequence question

richard_s_2
Beginner
446 Views

I am trying to increase my encode performance, and am wanting to understand the optimal operation sequence of Encode() and SyncOperation() calls.  The sdk manual notes:

For performance considerations, the application must submit multiple operations and delays synchronization as much as possible, which gives the SDK flexibility to organize internal pipelining. For example, the operation sequence, ENCODE(f1)  ENCODE(f2)  SYNC(f1)  SYNC(f2) is recommended, compared with ENCODE(f1)  SYNC(f1)  ENCODE(f2)  SYNC(f2).

Suppose I have multiple streams A and B to encode simultaneously and I submit them as follows:

  • Encode(A1), Encode(B1), Encode(A2), Encode(B2)

My possible SyncOperation sequences are as follows:

    #1.  Sync(A1), Sync(A2), Sync(B1), Sync(B2)
    #2.  Sync(A1), Sync(B1), Sync(A2), Sync(B2)

Will the performance be the same? #1 better? #2 better?  Does it matter if they are a joined session or not (they are not currently JoinSession'ed)?

0 Kudos
4 Replies
Petter_L_Intel
Employee
446 Views

Hi Richard,

For optimal performance you must design your pipelines so that several codec operations are "in-flight" at the same time. As you state, this can be done by invoking several Encode calls before calling Sync.

Such task oriented usage is illustrated both in the Media SDK "sample_encode" sample and the Media SDK tutorial "simple_3_encode_d3d_async" sample.

Regarding handling of several concurrent streams. To limit implementation complexity, our recommendation is to host each stream pipeline is separate thread. An example for this usage can be found in Media SDK tutorial sample "simple_6_transcode_opaque - async - vppresize – multi".

The primary use for "JoinSession" is for the case of using SW codec, to avoid CPU thread over subscription.

Regards,
Petter 

0 Kudos
richard_s_2
Beginner
446 Views

Thanks for the response.  Suppose I have the option of putting 2 operations in flight from either a single stream (2 operations from 1 stream), or from 2 streams (1 operation from each of 2 streams). Will there be a difference in performance?

A related question is will there be performance benefits to batching Sync calls across concurrent streams?

0 Kudos
Petter_L_Intel
Employee
446 Views

Hi Richard,

If I understand you description correctly, there should be no performance difference between the two modes you describe.

Regarding the question about "batching" Sync for the case when processing many concurrent streams. As the number of concurrent streams increase the need (to achieve high performance and utilize the GPU optimally) for batching Sync calls becomes less important due to the fact that the individual pipelines will use different parts of the GPU at different times and due to the greater load also keeping the processor in turbo mode consistently.

Regards,
Petter 

0 Kudos
richard_s_2
Beginner
446 Views

Thanks, the clarification helps!

0 Kudos
Reply