Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

Best approach to use the Media Server Studio SDK in my application ?

Robby_S
New Contributor I
403 Views

Hi, I have some application software that I am trying to optimize its performance by utilizing the Media Server Studio SDK. I have some big-picture questions regarding how to approach this project.

First, a short explanation of my software. It's a video surveillance program. It monitors multiple channels. Each channel runs in its own thread, and takes an H.264 bitstream as input, so an H.264 decoder is needed for each channel.

The software then performs certain surveillance operations on the decoded raw video frames. If some activity is detected, the raw video frames with the activity will be encoded and saved into a file. For each activity detected, a separate file needs to be generated. Thus each channel may launch the H.264 encoder, possibly multiple times, but not at the same time.

For each channel, I have large buffers in front of both the decoder and encoder, so as to minimize the number of dropped frames. We are talking about many channels here, up to 50. Currently my software uses pthread for multi-threading, and uses the x264 library for the H.264 video codec. For now, let's ignore the audio part.

I spent some time studying the documents and the sample code provided in the Media Server Studio SDK. It appears to be quite complicated. Besides the APIs to video codecs, the Media Server Studio SDK also provides other components such as pipelines, surfaces, MSDK threads, and bitstream file I/O handlers. All these components must fit into a complicated framework to do decoding, encoding, or transcoding, as shown in the sample codes.

Granted not all these components are in the Media Server Studio SDK; some are provided in the sample codes, but I can't easily tell which are essential, and which are nice add-on features.

So here are my questions:

As described above, my software already has its own framework to handle multi-threading, buffers, file I/O, and others. Should I adopt a framework shown in one of Media Server Studio SDK's sample codes, such as sample_multi_transcode, and fit my software into this new framework? Or should I cherry-pick various components from Media Server Studio SDK, and fit them into my software current framework?

Although the first approach should yield better performance, either approach seems to require significant development effort. The challenge for the second approach is to decide which MSDK components are essential, and which are not.

Are there any other approaches that require less development effort?

Suggestions and opinions are welcome and greatly appreciated.

Thanks,

Robby

0 Kudos
1 Solution
Bjoern_B_Intel
Employee
403 Views

Hi Robby,

I think that your main idea here is to take advantage of the media hardware acceleration provided by the Intel processors. Those processors are coming with a CPU (multiple cores), a GPU (many EUs), and media fixed function logic in silicon which will give you performance advantage compared to a pure software approach.

As your software is doing multiple transcodes, you need to have multiple Media SDK sessions running in parallel. Your current multithreaded implementation would support this model nicely. Which means if you go and cherry-pick the components needed in your transcode pipeline (decode, video processing, encode), it will work as expected. Given that you are doing 50 transcodes, you might not be pleased with the performance scaling. Multisession performance gains are not linear. Individual session throughput will naturally be lower, compared to single session throughput since the session will share the processor resources (VDBOX, VEBOX, EUs).

A more advanced approach presented in the sample is to handle multiple simultaneous sessions/pipelines. This is designed to facilitate accurate system resource sharing to boost the performance of advanced workloads such as transcodes of multiple inputs at the same time. This is what you might want to consider implementing. While one session holds the context of execution for a task, and may contain only a single instance of DECODE, VPP, and ENCODE, you need additional sessions for multiple simultaneous pipelines. To avoid duplicating the resources needed for each session they can be joined. This sharing also enables task coordination to get the job done in the most efficient way by using the Media SDK internal thread pool and scheduler.

To summarize and as you already identified: For best performance you also want to share the resources which is properly worth the extra development effort. Selecting the components (e.g. VPP filters) should not require a big development effort. Though, I suggest to start with this as a first step in your current framework to see incremental progress.

Best Regards,

Bjoern

View solution in original post

0 Kudos
3 Replies
Bjoern_B_Intel
Employee
404 Views

Hi Robby,

I think that your main idea here is to take advantage of the media hardware acceleration provided by the Intel processors. Those processors are coming with a CPU (multiple cores), a GPU (many EUs), and media fixed function logic in silicon which will give you performance advantage compared to a pure software approach.

As your software is doing multiple transcodes, you need to have multiple Media SDK sessions running in parallel. Your current multithreaded implementation would support this model nicely. Which means if you go and cherry-pick the components needed in your transcode pipeline (decode, video processing, encode), it will work as expected. Given that you are doing 50 transcodes, you might not be pleased with the performance scaling. Multisession performance gains are not linear. Individual session throughput will naturally be lower, compared to single session throughput since the session will share the processor resources (VDBOX, VEBOX, EUs).

A more advanced approach presented in the sample is to handle multiple simultaneous sessions/pipelines. This is designed to facilitate accurate system resource sharing to boost the performance of advanced workloads such as transcodes of multiple inputs at the same time. This is what you might want to consider implementing. While one session holds the context of execution for a task, and may contain only a single instance of DECODE, VPP, and ENCODE, you need additional sessions for multiple simultaneous pipelines. To avoid duplicating the resources needed for each session they can be joined. This sharing also enables task coordination to get the job done in the most efficient way by using the Media SDK internal thread pool and scheduler.

To summarize and as you already identified: For best performance you also want to share the resources which is properly worth the extra development effort. Selecting the components (e.g. VPP filters) should not require a big development effort. Though, I suggest to start with this as a first step in your current framework to see incremental progress.

Best Regards,

Bjoern

0 Kudos
Robby_S
New Contributor I
403 Views

Thanks Bjoern for the detailed response.

Starting with the current framework to see incremental progress is a good idea. I probably will try that first, starting with the decoder.

-Robby

0 Kudos
Bjoern_B_Intel
Employee
403 Views

Robby,

You are welcome. So we can close this thread here.

Best,

Bjoern

0 Kudos
Reply