We encode stream with your AVC Encoder (Media SDK 2.0.10). We implement hardware-acceleration. Color space conversion from YV12 to NV12 is made by copying bytes from a source buffer into a buffer of the encoder. But now we have added using of your VPP for conversion. Performance has considerably decreased.
For example: frame size 720x480, color format YV12 not use VPP - 584 fps use VPP - 346 fps
It is expected that the performance using VPP (even if it is HW accelerated) in your pipeline will impact overall performance. However, the difference in performance you report is quite large which leads me to believe there might be something else going on.
Are you using D3D or system memory surfaces? For the case when VPP was not used, was the NV12 surface read from file and managed in the same way as the YV12 surface in your code?
It would be interesting to know how much the mem copy impacts performance. Did you have a similar copy segment for the case of using NV12 surface?
Also, for the case of using VPP, make sure that Media SDK does not fall back on SW implementation. This could happen for certain initialization parameter settings. You can check the selected implementation by calling "MFXQueryIMPL" after completing Encoder and VPP component initialization.
Thanks for clarifying the way you handle the case when not using VPP.
The key difference between the two cases is that for the case when not using VPP you are performing copy and convert in the same operation. While for the VPP case you are performing copy then convert separately, which will naturally impact performance compared to combined copy/convert.
There is not really a way around this unless you have the ability to read from file straight into the surface buffer.
To assess the impact of the copy before VPP try to remove it and measure the performance again (I know this will result in garbage frame data but it will give you a sense of the copy performance impact).
For the case of using HW acceleration I suggest you try using D3D surfaces instead of surfaces in system memory. This will eliminate some internal Media SDK copies from system memory to D3D surface. I do not know the greater context or our implementation, so I'm not sure that approach is feasible for you, but if possible I suggest exploring this option.
According to your advice I removed the copy before VPP and measured the performance again. But it has not given any results. The problem with VPP remains. For the case when not using VPP I tried to do copying then converting separately. But it has not given any results too. The impact of the copy is insignificant.
You suggested us try using D3D surfaces instead of surfaces in system memory. But it for us will not approach.
I did some experiments on my side and in your case I think the best solution is to handle color conversion yourself and not use VPP.The reason behind this is that your method allows you to efficiently handle copy and convert in one operation. Compared to the case of using VPP, whereyou would need to copy YV12 to MSDK surface (some internal MSDK surface copies, more on that below) and transfer to/from HW. If you do not plan on using other VPPpreprocessing operation such as scaling, then using your own color conversion routine may be the most efficeint option.
I'd like to expand a bit on the implicit difference between using D3D vs. memory surfaces with Media SDK, and specifically using encode.
Correct me if I'm wrong, but my assumption is that you are comparing the following two scenariosboth using HW encode and system memory surfaces:
1. Not using VPP but your own copy convert routine:
raw file -> raw data -> copy/convert from YV12 to NV12 -> MSDK:Copy surface from sysmem to D3D -> MSDK:HW encode -> MSDK:Copy BS from D3D to sysmem
2. Using VPP:
raw file -> raw data -> copy raw data to YV12 surface -> MSDK:Copy surface from sysmem to D3D -> MSDK:HW VPP -> Copy surface from D3D to sysmem -> NV12 surface (sys mem) -> MSDK:Copy surface from sysmem to D3D -> MSDK:HW encode -> MSDK:Copy BS from D3D to sysmem
As you can see, due to using system memory with HW target, there are several additional copies between system memory and D3D memory (required since HW only operates on D3D surfaces) in the VPP case.These additional copies are quite costly and impact performance.
Compare the above to the scenario below that is using D3D surfaces with HW VPP:
raw file -> raw data -> copy raw data to YV12 surface -> MSDK:HW VPP -> NV12 surface (D3D) -> MSDK:HW encode -> MSDK:Copy BS from D3D to sysmem
(note that the above tries to illustrate your setup, copy raw data to YV12 surface can be eliminated if read from file directly)
Are you sure using D3D surfaces is not an option for you? It should not affect the way you feed or extract data to/from MSDK, and usinf D3D would speedup your solution using your own copy-convert too.