I'm running ippiCrossCorNorm_16u32f and ippiCrossCorNorm_32f on ipp 8.1
It seems that timing is related only to src and template sizes.
I tried running with ippAlgDirect and ippAlgFFT but timing is the same in both methods.
timing is also the same for 16u input and float input
Im I doing something wrong?
Only several flavors of this functionality have "direct" algorithm - 8u, 8u32f and 32f flavors for "Valid" case. All other have FFT-based implementation only. I guess it's understandable that for example for 8u data type the "direct" method is limited by template size and accumulator data type (and guess it's understandable that for 8u accumulator is 32s). Therefore you can see performance difference between these 2 methods only for mentioned above data types and only for narrow range of template sizes - internally it's defined as #define CROSSCORRVALID_MAX_DIRECT_TPL_SIZE 256 * 127.
and sorry, I've forgotten to say about 16u and 32f performance - internal implementation for both these data types is based on 2D 32f FFT - this is why performance is the same.