There is performance implication if normal copy had to also support overlapped regions. That is why special functions implemented in C run time and in IPP libraries.
For 2D case you can just write a wrapper on top of ippsMove function and potentially even thread this with OpenMP. This will be very similar to what it may look like if implemented in IPP
Thanks for your feedback. Can I ask you what kind of applications you develop with IPP? What are the most important features of IPP from your prospective and what are the weak areas?
Regards,
Vladimir
Thanks Vladimir.
On my system memmove and memcpy seems to have the same speed. Actually you just iterate from the start or beginning depending on if the dest is before source or vice versa. Really simple unless you have some really funky optimizations (which you might have, I have no idea).
I have made such a wrapper. Speed is satisfactoy.
I am creating a VNC-like app with a very advanced codec. It is the high level parts that are advanced btw, and I am using IPP for parts of the codec. Mostly the Jpeg codec and some copy functionality.
I find the speed most satisfactory for the most part. I miss the Java way of documentation, but that is more a C/C++ problem.
The only performance problem I have found is in the function that swaps bytes in an int, essentially converting from lille- to big endian and back. My own function is much faster there (using bswap asm).
You should have a look at quicklz.com. It is an _extremely_ fast compressor/decompressor. Much much faster than anything you have I think. One problem is that if I compile it with ICC 11.1 and enable -parallel then it sucks upp all 4+4 cores on my Mac Pro i7 and performs 50% worse... Strange.
Keep up the good work!
Cheers,
Mikael