I'm reposting this, I hope it won't end up ad a double posting.
I have built a solution for a customer, where MediaSDK is used to decode multiple H.264 input streams (in Direct3D9 gpu memory), blit them onto a single RGB32 surface, then use VPP to convert that into an NV12 surface, and encode that into H.264, all in GPU memory.
It works very well on Ivy Bridge (HD4000), producing 183fps for FullHD, 2mbps output video, 4 SD 1mbps input videos.
However, on an i5 Haswell with HD4600, I was surprised to find out the performance was only 139fps, instead of higher performance than the HD4000.
My investigation led me to removing the VPP colour-space conversion (practically encoding empty frames, while doing all o the rest of the work). The Ivy-Bridge HD4000 now did a little bit better (183->200fps), while the Haswell HD4600 jumped to about twice the previous performance (139->268fps).
I am wondering if I'm doing it wrong ? Is there an issue in the driver with this VPP operation ? Is there a bypass that will regain me this performance ?
Can you provide specific CPU and platform configuration information and the specific graphics driver being used? In general, the 4th Generation Core Processors are designed to save power and there are certainly some use cases where performance may be slower (and other use cases that are faster, but using less power).
Also, the graphics driver implementation can have an affect on performance and you may notice some improvements with newer drivers.