Solved: Best approach to use the Media Server Studio SDK in my application (II) ?

Robby_S · ‎10-06-2015

Hi, I'd like to ask a follow-up question to my earlier question: "Best approach to use the Media Server Studio SDK in my application ?" I have this new question after I experimented with the Media Server Studio SDK a little, and after I heard about the VA API.

Per my discussion with Bjoern, in order to achieve the optimal performance, I would need to modify my application software in such as way as shown in one of Media Server Studio SDK's sample codes. Doing so would allow hardware resources to be shared for multiple simultaneous sessions/pipelines.

In order to see incremental progress, I could cherry-pick some components from the Media Server Studio SDK, and fit them into the framework of my application software. Supposed this second approach should be easier to implement, at the cost of the performance.

I spent a few days experimenting with the 2nd approach. It's not as easy as I had expected. All the sample codes in the Media Server Studio SDK are written in a tightly wrapped way, with layered C++ class inheritances. It's difficult to peel off just one component and plug it into my framework. For example, replacing the decoder with its MSS SDK counterpart is not a simple drop-in replacement. Since the MSS SDK decoder only takes an elementary bit-stream. I would also need to add a demultiplexer (splitter) so as to handle the H264 container, if I wanted to use the splitter provided in an MSS SDK sample code. That in turn would require the bit-stream handler. Things got complicated quickly and pretty soon I found that I was trying to fit my own code into the MSS SDK framework !

Another uncertainty/unknown is how I could fit the MSS SDK sessions into my current multi-threading. I have not understood the MSS SDK threads well enough to be 100% confident, but replacing my current pthreads with the MSDK threads doesn't look straightforward.

So, I have this fundamental question to ask: Is the MSS SDK designed in such a way that it almost requires the user to use a framework similar to the ones shown in an MSS SDK sample code? If that's the case, I think I can understand the logic behind it, would just bite the bullet, and go with the 1st approach.

I am a DSP guy. Before I used MSS SDK, I was expecting something like replacing slow_fft(input *, output *) with fast_fft(input *, output *). That has not been the case.

Which brings up my next question: Is the VA API more like what I was expecting, and therefore, more appropriate for my 2nd approach ? If, instead of fitting MSS SDK components into my software, I just make VA API calls in the framework of my software, would that be a quicker way to see incremental progress, performance-wise ?

Suggestions/opinions/insights are welcome and greatly appreciated.

Thank you,

Robby

Sravanthi_K_Intel · ‎10-06-2015

Hi Robby,
According to your previous posts, you pipeline looks like this:
Camera (N channels) -> Decode (N) -> Video Processing -> Encode (M, if motion detected)
You have individual threads handling each channel, and you are using ffmpeg today in your framework to do the media part. And you are looking to see if MSDK can improve the performance of your pipeline.

Some pointers:

- MSDK gets performance benefit by using the HW logic for media processing, and using the video memory on the GPU to do that. Ffmpeg, on the other hand, is a CPU implementation and will use system memory on CPU to perform operations. Any integration of ffmpeg with MSDK will require explicit copying of data from system<->video memory.

- As you noted, ffmpeg does demux and mux and handles audio channel in addition to video. MSDK operates on elementary video streams. So, replacing an ffmpeg command line (encode for example), would mean you have to break it into: ffmpeg-demux -> ffmpeg audio, MSDK video -> ffmpeg-mux. Audio is known to be quite an insignificant portion of processing compared to video, so you will see benefits of doing this surely (although it will be offset by the copying you will have to do from system<->video memory).

- If you replaced the entire pipeline with MSDK (and used ffmpeg for demux/mux/audio), then you can take a look at our full_transcode sample https://software.intel.com/sites/default/files/MSDK_Samples_Deprecated_Linux.tar.bz2. We consider this a "deprecated sample" in that we do not provide support for it today, but the sample works and can provide you with some pointers on how to proceed with ffmpeg-msdk integration.

- Reg option (1), the sample I pointed to above will give you pointers on how to do it. Note: sample_multi_transcode is a great sample if you are doing multiple file-to-file transcodes in parallel in one application. The sample handles threading inherently based on MSDK model. But in your pipeline, I do not see a stage where you will have multiple input files simultaneously ready to be either transcoded or encoded. You will only encode when your algorithm detects some activity on a channel, and not all 50 channels need to fire off encoding at the same time either - so where in your pipeline would you have multiple input files ready to be (decoded/encoded) at the same time, in one application? Please correct me if I am misunderstanding.

- To your fundamental question - Yes, MSDK framework is essential to extract the available performance and density from the underlying hardware. The framework needs to be adhered to, or mapped to expose the HW features.

- Let me know if any of the suggestions above are useful to you and can make your application work. Else, do respond on this thread again. I have another option that can help you (but it's in beta mode and would take that discussion offline once you get back to me).

View solution in original post

Sravanthi_K_Intel · ‎10-06-2015

Hi Robby,
According to your previous posts, you pipeline looks like this:
Camera (N channels) -> Decode (N) -> Video Processing -> Encode (M, if motion detected)
You have individual threads handling each channel, and you are using ffmpeg today in your framework to do the media part. And you are looking to see if MSDK can improve the performance of your pipeline.

Some pointers:

- MSDK gets performance benefit by using the HW logic for media processing, and using the video memory on the GPU to do that. Ffmpeg, on the other hand, is a CPU implementation and will use system memory on CPU to perform operations. Any integration of ffmpeg with MSDK will require explicit copying of data from system<->video memory.

- As you noted, ffmpeg does demux and mux and handles audio channel in addition to video. MSDK operates on elementary video streams. So, replacing an ffmpeg command line (encode for example), would mean you have to break it into: ffmpeg-demux -> ffmpeg audio, MSDK video -> ffmpeg-mux. Audio is known to be quite an insignificant portion of processing compared to video, so you will see benefits of doing this surely (although it will be offset by the copying you will have to do from system<->video memory).

- If you replaced the entire pipeline with MSDK (and used ffmpeg for demux/mux/audio), then you can take a look at our full_transcode sample https://software.intel.com/sites/default/files/MSDK_Samples_Deprecated_Linux.tar.bz2. We consider this a "deprecated sample" in that we do not provide support for it today, but the sample works and can provide you with some pointers on how to proceed with ffmpeg-msdk integration.

- Reg option (1), the sample I pointed to above will give you pointers on how to do it. Note: sample_multi_transcode is a great sample if you are doing multiple file-to-file transcodes in parallel in one application. The sample handles threading inherently based on MSDK model. But in your pipeline, I do not see a stage where you will have multiple input files simultaneously ready to be either transcoded or encoded. You will only encode when your algorithm detects some activity on a channel, and not all 50 channels need to fire off encoding at the same time either - so where in your pipeline would you have multiple input files ready to be (decoded/encoded) at the same time, in one application? Please correct me if I am misunderstanding.

- To your fundamental question - Yes, MSDK framework is essential to extract the available performance and density from the underlying hardware. The framework needs to be adhered to, or mapped to expose the HW features.

- Let me know if any of the suggestions above are useful to you and can make your application work. Else, do respond on this thread again. I have another option that can help you (but it's in beta mode and would take that discussion offline once you get back to me).

Robby_S · ‎10-06-2015

Hi Sravanthi,

Thanks for your detailed response, and thanks for the confirmation to my fundamental question.

Your description of my application is correct. However, for prototyping/testing purposes, my pipeline, per channel, looks more like this right now:

Bitstream file -> Decode -> Video Processing -> Activity detection -> (if detected) Encode -> Output bitstream file(s)

For each channel, there will be K output bitstream files generated, where K is 0 if no activity is detected, and K > 1 if multiple activities are detected at different times. So the multiple input files are at the 1st stage.

Although it may be rare in real life, by my software must account for the case when all 50 channels detect some activities, and fire off the encoder. Actually I can choose the test files to guarantee that will happen.

I'll try to map my application to the MSDK framework, and see how it goes.

Thanks again,

Robby

Robby_S · ‎10-06-2015

Also, how about VA API? Say if my software only needed to support 8 channels, would going with the 2nd approach by making VA API calls give me enough performance with less development effort?

-Robby

Sravanthi_K_Intel · ‎10-08-2015

Hi Robby - Let us know how the integration works for you.

On your question on VAAPI, short answer is No, the performance will not be the same. You will lose quite a bit in "translation" since you are losing the optimizations MSDK adds atop VAAPI. May be this thread is of interest to you - https://software.intel.com/en-us/forums/intel-media-sdk/topic/559845