Severe performance problems copying to video memory with ippiCopy 8u_AC4R if using P4 or above features
I'm getting performance 10 or 15 times slower if I copy to from system RAM to video RAM with ippiCopy after callingippStaticInitCpu with a value greater than or equal to ippCpuP4. Initialising with values of ippCpuPIII or less works fine.There are severe performance penalties in Windows DirectShow reading from video ram but I expected copying to video ram wouldn't be a problem.
I'm aware that this behaviour is likely to be highly system and BIOS dependent and that there are work arounds such as doingmy own manual dispatching for this operation to prevent usage of P4 or above features.
What worries me is what my general approach should be for avoiding problems like this on an arbitrary customer's system.
Is it generally safe to use to use IPP to output to video ram at all if reading from video ram is very expensive? Do some IPP functions that appear to only write to destination memory actually read from as an optimization technique? Should I be using ippiCopyManaged for this operation to force a particular caching strategy for safety though it's not available in the particular mode I'm using ?
My system config is:
x86 IPP 7.0 update 4 statically linked with dispatching in an x86 COM object
Thanks for the suggestion.It looks like the problem is related to alpha channel processing.I should have said that I'm usinag ippiCopy_8u_AC4R. If I substitute ippiCopy_8u_C1R with a 4* wider ROI then the performance is great. For my purposes these are equivalent as the alpha channel is not used. The performance of ippiCopyManaged is pretty similar.Presumably 8u_AC4R is preserving any alpha information in the destination image by reading the destination image before bitwise combining the source image channel.
Are there any other subtle cases to watch for where IPP functions read from the destination?