Intel® Integrated Performance Primitives
Community support and discussions relating to developing high-performance vision, signal, security, and storage applications.
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
6628 Discussions

Severe performance problems copying to video memory with ippiCopy 8u_AC4R if using P4 or above features

I'm getting performance 10 or 15 times slower if I copy to from system RAM to video RAM with ippiCopy after callingippStaticInitCpu with a value greater than or equal to ippCpuP4. Initialising with values of ippCpuPIII or less works fine.There are severe performance penalties in Windows DirectShow reading from video ram but I expected copying to video ram wouldn't be a problem.
I'm aware that this behaviour is likely to be highly system and BIOS dependent and that there are work arounds such as doingmy own manual dispatching for this operation to prevent usage of P4 or above features.
What worries me is what my general approach should be for avoiding problems like this on an arbitrary customer's system.
Is it generally safe to use to use IPP to output to video ram at all if reading from video ram is very expensive? Do some IPP functions that appear to only write to destination memory actually read from as an optimization technique? Should I be using ippiCopyManaged for this operation to force a particular caching strategy for safety though it's not available in the particular mode I'm using ?
My system config is:
x86 IPP 7.0 update 4 statically linked with dispatching in an x86 COM object
Visual Studio 2008 C++ x86
Dell Precision M5400
Windows7 x64 Professional
Intel Core i7 Q820 1.73GHz
ippGetCpuType returnsippCpuSSE42
Intel 5 series/3400 chipset
0 Kudos
2 Replies


If the video data are not used in the future, it is better to use copying without caching the destination image. It can use ippiCopyManaged function to control the behavior.


Thanks for the suggestion.It looks like the problem is related to alpha channel processing.I should have said that I'm usinag ippiCopy_8u_AC4R. If I substitute ippiCopy_8u_C1R with a 4* wider ROI then the performance is great. For my purposes these are equivalent as the alpha channel is not used. The performance of ippiCopyManaged is pretty similar.Presumably 8u_AC4R is preserving any alpha information in the destination image by reading the destination image before bitwise combining the source image channel.
Are there any other subtle cases to watch for where IPP functions read from the destination?