Maybe I'm just unable to find it (but I have been browsing the manuals for an hour) but I can't seem to locate a function like ippiCopy_xx that works for overlapping RIOs. ippiCopy doesn't work if the memory areas overlap unless the delta x and delta y are both zero or negative.
It's quite common to want to move around an image within another one so I one such function must exist. It's like memcpy vs memmove. Can someone point me in the right direction?
Btw, am I suppost to post these things on the premier support site?
There is performance implication if normal copy had to also support overlapped regions. That is why special functions implemented in C run time and in IPP libraries.
For 2D case you can just write a wrapper on top of ippsMove function and potentially even thread this with OpenMP. This will be very similar to what it may look like if implemented in IPP
Thanks for your feedback. Can I ask you what kind of applications you develop with IPP? What are the most important features of IPP from your prospective and what are the weak areas?
On my system memmove and memcpy seems to have the same speed. Actually you just iterate from the start or beginning depending on if the dest is before source or vice versa. Really simple unless you have some really funky optimizations (which you might have, I have no idea).
I have made such a wrapper. Speed is satisfactoy.
I am creating a VNC-like app with a very advanced codec. It is the high level parts that are advanced btw, and I am using IPP for parts of the codec. Mostly the Jpeg codec and some copy functionality.
I find the speed most satisfactory for the most part. I miss the Java way of documentation, but that is more a C/C++ problem.
The only performance problem I have found is in the function that swaps bytes in an int, essentially converting from lille- to big endian and back. My own function is much faster there (using bswap asm).
You should have a look at quicklz.com. It is an _extremely_ fast compressor/decompressor. Much much faster than anything you have I think. One problem is that if I compile it with ICC 11.1 and enable -parallel then it sucks upp all 4+4 cores on my Mac Pro i7 and performs 50% worse... Strange.