Media (Intel® oneAPI Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools from Intel. This includes Intel® oneAPI Video Processing Library and Intel® Media SDK.
2962 Discussions

Best approach to use the Media Server Studio SDK in my application (3) ?

New Contributor I

Greetings again,

Working on the ideas from two previous discussions (here and here), my multi-channel video surveillance system now looks like this:

                             stage 0           stage 1                  stage 2                       stage 3                     stage 4                         stage 5
Channel 0:        Bitstream file -> Decode -> Video Pre-processing -> Activity detection -> (if detected) Encode -> Output bitstream file(s)
Channel (N-1):  Bitstream file -> Decode -> Video Pre-processing -> Activity detection -> (if detected) Encode -> Output bitstream file(s)

where each channel is an MSDKThread.

The most important part of each channel is stage 3, the activity detection. Stage 2, the pre-processing, is compute-intensive, but its interfaces are simple: raw frame in, raw frame out. This stage prepares each frame for the detection.

From what I understand of Media SDK Developer's Guide (DG), stages 2 and 3 should be user-defined modules that follow the decoder module, so here are my questions:

1, Are user-defined modules the right way to go? If not, what do you suggest?
2, Which makes more sense? Making stages 2 and 3 each a separate module, or combine them in one module? The tricky part is that the detection algorithm likely requires the pixel data in the system memory, so data copy will be needed.
3, The DG talks about using the mxfPlugin to create a user-defined module, and using the USER class to integrate it into the Media SDK. I am still not quite clear exactly how to do both. Are there some examples demonstrating these steps?
4, How can't I find mxfPlugin's definition in the Reference Manual?

As always, suggestions/opinions/insights are welcome and greatly appreciated.


0 Kudos
9 Replies
New Contributor I

Update: Regarding question 3, I see there are a few examples in sample_plugins. I'll take those as a starting point, if user-defined modules and plugins are the way to go.


Hi Robby - Apologies for the delay. Plugins is the right way to go. But instead of plugins that operate in the system memory, have you considered implementing your algorithms in OpenCL and packaging them as kernels? The biggest benefit of doing this is that openCL and MFX (encode/decode in hw) share the same phy memory, thus removing the need to copy any data. Basically, OpenCL and MSDK interoperate together nicely. We have some examples (openCL rotate plugin) in sample_encode and sample_multi_transcode samples. In addition, I have attached another simple tutorial I had created to show how to add OpenCL code in MSDK without plugin (we recommend using plugin model for sure, but this example is simple enough to show how to use opencl in msdk).

Hope this helps. 


Attached the tutorial example showing OCL and MSDK interop. Again, this does not use the plugin model (and we recommend the plugin model for the purpose of modularity).

New Contributor I

Hi Sravanthi, thanks for the response and the sample. We do indeed to port part (if not all) of the activity detection into OpenCL. However, that will happen after we have the C++ implementation working in the MSDK framework. One step at a time ... ;-)


Good, glad you are aware of the OpenCL path and are working on the plugin implementation - that's our's recommendation too. Let us know if you need any other technical information.

New Contributor I

Hmmm... I looked at the Intel Media SDK 2014 Developer's Guide (DG) again, and now I am not so sure the approach I was thinking would work.

If I take the pseudo-code example from section 4.12.2 of DG, Using a User-Defined Module in a Media Pipeline, and remove the stuff I don't need, it looks like this:

/* initialization */
MFXVideoDECODE_Init(session, decoding_configuration);

/* main loop */
do {
    /* load bitstream to bs_d */
    MFXVideoDECODE_DecodeFrameAsync(session, bs_d, surface_w, &surface_d, &sync_d);
    MFXVideoUSER_ProcessFrameAsync(session, &surface_v, 1, &surface_u, 1, &sync_u);
    MFXVideoENCODE_EncodeFrameAsync(session, NULL, surface_u, bs_e, &sync_e);
    MFXVideoCORE_SyncOperation(session, sync_e, INFINITE);
    /* write bs_e to file */
} while (!end_of_stream);

/* Close my user module */

There is one big problem, however, in that I don't want to re-encode every frame. I want to encoder and record into a bit-stream file only after some activity is detected in the video. And if separate activities are detected, they must be saved into separate bit-stream files.

For example, during the main loop, someone enters the scene and is detected. That should trigger the encoder and save the encoder output to a file. Ideally the file name is stamped with the time when the activity is detected. Then that guy leaves the scene; the encoder should stop, and encoder output bit-stream file should be closed. Two hours later, someone else enters the scene and is detected. That should trigger the encoder again, and this time the encoder should output to another bit-stream file, with its file-name stamped with the new time.

It doesn't look like the pseudo code above would meet my requirements.



Good question Robby. If you look at simple_transcode_opaque_async example, you will see how decode-vpp-encode call-back functions are implemented using sync calls. The code flow looks like this:


dsts = decode(&, &, syncD)

if (dsts = success and syncD) 


vsts = vpp (&, &, syncV)

if (vsts = success and syncV)


ests = encode(&, &, syncE)


In your application, you want to replace the vpp stage with a call to OpenCL/C++ function. In your implementation, ensure you handle the return status and surface sync appropriately. For example, (vsts=-99 => no motion detected. unlock the surface and return sync point, and dont enter encode function). (vsts=0 => motion detected, start encode. unlock surface and return sync point) you can maintain another variable or manage sync/sts variables accordingly to trigger "new encode", "finish encode" signals. In general, managing the return status and sync points within your kernel can handle the use-case you are trying to achieve. 

You can look at the openCL tutorial I attached previously as reference too, in addition to the transcode tutorial.

Hope this helps. 


New Contributor I

Hi Sravanthi, thanks for the suggestions. 

Another concern I have is the file I/O part. Since we have to generate a separate output bit-stream for each new event (activity detected), we have to open/close files on the fly. Every time before triggering the encoder, a new CSmplBitstreamWriter has to be instantiated and initialized. When the event is gone, the file writer needs to be closed. At the very least, the file writer should be re-initialized if not re-instantiated.

Therefore here are more questions:

1. The operations described above seem do-able, but how do I put sync points around them?

2.  What's the impact on the performance? Let's say I have 30 channels running concurrently, and 3 of them detect something and need to open output files on the fly. Will that slow the whole system down?

[Updated to add one more question]

3. What are the other options that can let me open/close output bit-stream files on the fly?

Thanks again.


Hi Robby - Your questions are moving away from MSDK! If you look at my prototype above, there is a conditional statement before each call. And two conditions that are checked today are sync points and status. To make your encoder fire when motion is detected, in your algorithm, make sure you define your own status. When a particular status is returned, you can handle accordingly. If you detect motion, return a status to reflect that and the encoder will start encoding the frame. If you detect no motion, the status should ensure the if() to enter encode is not met. In addition, ensure your surface is not locked anymore by the processing stage.

In general, if you look at the tutorial I sent you and the plugin sample, you should be able to put them together to make your application. Beyond this point, helping with your application will get difficult on the forum. 

Reg performance impact - Apart from spending time and resources on encodes that will be triggered on motion detection, you should not see any "side-effect" performance degradation.