"I belong to FFDShow Tryout development team and we are trying to reproduce your work.
For your information we imported the MPC-HC DXVA implementation into our project recently
The goal is to decode the frames with DXVA 1 & 2 and then copy back the frames into system memory to process them and then write them back.
But we have a problem : we don't get the same speed results as yours (I have a Q9450 with a radeon 5750 in PciExpress 16)
With memcpy or the SSE4.1 optimized copy method, it takes 80ms to copy 1 frame
Do you have an idea about what is wrong ?"
"Either we are doing something wrong (but I begin to doubt it), or else the sense GPU=>CPU gives by designed slow transfers
I hope that one will be able to get in touch with the intel's guy who wrote this article (but I guess that he only tried with low res videos)
...Also note that we are talking about reading (GPU=>CPU), writing is very fast though."