What algorithm was used for function ippiDFTFwd_RToPack_32f_C1R?
And what is the algorithm complexity? like the method in FFTW?
I tried to increase the image size for the ippiDFTFwd_RToPack_32f_C1R, the execution time is not monotone increasing (of couse, 2^n is of the minimum execution time). Is there any theory analysis of the algorithm complexity for ippiDFTFwd_RToPack_32f_C1R? Many thanks!
IPP DFT is based on vector/image length/size decomposition on primes. Prime factors that have special code branches are 2,3,5,7,11,13. For powers of 2 (FFT) supported radixes are 2,4,8,16 (depends on architecture - radix-8 is supported on Intel64, radix-16 - on Xeon-Phi only). For lengths/sizes that can't be decomposed on these primes (or for reminder parts) - convolution based algorithm is used. In general case you may consider algorithm complexity as ~5*n*log2(n) arithmetic operations per 1 call (1D case, n==vector length).