I am using the mfx decoder from Media SDK and want to transfer the decoded buffer to OpenCL. What is the best way to transfer this to OpenCL?
It seems it can be done in these ways:
1) decode the video into system memory and transfer to OpenCL as a buffer
2) decode the video into DirectX texture buffer and pass to OpenCL as texture image
3) decode the video into OpenGL texture buffer (is it possible?) and pass to OpenCL as texture image
The OpenCL code will use quite a lot texture read from other sources (not the decoder buffer), it might be better to have the buffer passed to OpenCL as plain buffer instead of texture.
From Media SDK document it looks like (2) DirectX texture buffer is better than (1) decode to system memory? But I do think using buffer instead of texture will improve the performance due to the above mentioned reason.
There is no mentioning of output the decoded buffer to OpenGL video frame, is it supported? Or one has to output to system memory and then upload to OpenGL texture?
I noticed you have another thread https://software.intel.com/en-us/forums/intel-media-sdk/topic/622296 trying to use MediaSDKInterop Sample,
I think you can also try MSDK sample_multi_transcode with -opencl parameter, just like following command line -
$ ./sample_multi_transcode_drm -hw -i::h264 test.h264 -o::h264 output.h264 -angle 180 -opencl
As Zach mentioned, one of the best places to find an implementation of media sdk decode->OpenCL is in sample_multi_transcode. This is closest to option #2 in your list.
Option #1 would work and is the easiest to code. Option 2 is better from a Media SDK perspective since HW decode to system memory implies extra copies and synchronization. However, you're right that option #1 may have advantages for OpenCL.
For option #2, you could also consider using DXVA hardware decode instead of Media SDK, which may simplify your implementation.
For the 3rd option, since Media SDK wraps DXVA decode there would be an additional directx->opengl step. So you would be adding complexity for what is likely to be no performance gain.
The best approach for your application may depend on several factors such as hardware, resolution, and balance between CPU/GPU and decode/OpenCL. So you may want to try both option 1 and 2.
BTW, the following article is a bit old and the sharing APIs have changed. However, it outlines why option #2 is best from a Media SDK perspective and explains a bit about the sharing mechanisms.
Thank you for your answer. That's very useful.
Is DXVA the same thing as Media Foundation Transform (mft)? I asked in another thread if I can use mft but the answer was HEVC 10 bit might not be working for mft. Is MediaSDK actually a wrapper for DXVA?
Why the mult_transcode sample is suitable for my purpose? I don't see it doing any rendering here.
If I tried to get it do the decoding and OpenCL rotate with this command line:
-hw -i::h265 input.265 -o::raw output.yuv -angle 180 -opencl
I got an error as unsupported in mfx_vpp_plugin.cpp line 388, when calling m_pPlugin->SetAuxParams(m_pAuxParam, m_AuxParamSize).
Isn't the MediaSDKInterOp a better sample for my purpose? But I can't get it to compile as it requires DirectX SDK 10. I downloaded it but could not install on my Windows 10 machine. Microsoft recommended not using this SDK but the app requires it to compile.
DXVA is not the same thing as Media Foundation Transform (mft). you can refer "Figure 3 Intel® Media SDK architecture showing software and hardware pathways " in page 9 in https://software.intel.com/sites/default/files/managed/09/02/Intel_Media_Developers_Guide.pdf
mult_transcode sample doesn't support -o::raw , and It has some opencl code. if it works for you, I think you can have a refer.
MediaSDKInterOp is a good sample, but I think https://software.intel.com/sites/default/files/managed/f3/ec/mss_ocl_surface_sharing.zip is more better, it is more simpler.
Thanks for the mss_ocl_surface_sharing smaple. It is using d3d9 though and it does not seem to being support P010 format which is the decoder output format for the input stream. I guess d3d11 will work. Do you have a sample that does the same thing but in d3d11? Or is there a way to let this sample handle P010 output format?
NV12 format is supported by OpenCL.
Regarding Sharing DX11 surfaces: https://software.intel.com/en-us/articles/sharing-surfaces-between-opencl-and-directx-11-on-intel-processor-graphics and pdf https://software.intel.com/sites/default/files/managed/d1/9a/Surface%20sharing%20between%20OpenCL_2_0%20and%20DX11%20rmi.pdf
10bit format is not supported by OpenCL.