The documentation states that video memory is faster than system memory for encoding, but it isn't, at least in Win10 + Kaby Lake + DX9. If the frame is in system memory in YV12 format, it's faster to copy-convert it to NV12 into a system surface and send that to the encoder than to convert it to NV12 to a video surface.
An encoder doesn't actually re-read old frames like the decoder does. If a frame is a reference to another, all comparisons are against the lossy coded version of the reference frame, not the original, so it makes sense not to sweat where the original frame is. What am I missing?
If my understanding is correct, the pipe line you described is:
And in general, our doc says NV12->Enc is faster in video memory than in system memory, if your statement includes YV12-->NV12, then this is not said by our doc, right?
I convert from yv12 to nv12 with ippiYCbCr420_8u_P3P2R, before QuickSync gets involved, both for system memory and video memory. The point is that I can't avoid a copy even for system memory. The number of copies is the same for system and video memory.