IPP FFT performance no improved with multiple threads

arkgr — Thu, 26 May 2011 15:33:22 GMT

I have the problem with FFT (IPP ver 7.0), ippsFFTFwd_CToC_32fc. The FFT len 2^19. According to ThreadedFunctionsList.txt, "ippsFFTFwd_CToC_32fc" is threaded.

I run it on 12 cores machine (L5640 2x6),through Parallel Studio, Visual Studio 2010 under Windows Server 2008, 64bit.

And see that only one core is working. And I did all that wroted in doc.

For instance, Direct FIR function is very good parallelized.

Can you help me with FFT ?

IPP FFT performance no improved with multiple threads

Chao_Y_Intel — Fri, 27 May 2011 08:00:47 GMT

Hello,

This looks a problem we discussed in the forum before. Please find some comments from the function expert on the performance:

1)FFT function uses memory buffer ~equal to vector length for rather small FFT orders ( < ~19 depends on platform (cache size)) therefore for such orders there is no difference between in-place and out-of-place cases performance FFT is calculated in the buffer and then result is copied to the destination so for in-cache cases it doesnt matter where to copy to src or to dst vector. For rather large orders (>19) in-place version is faster as internally FFT uses buffer of smaller size (less than input vector length). I think that HDD case should not be discussed here

2) FFT is threaded for fit into shared L2 cases only and for Core2 CPUs only (and on 2 threads only). For small orders OMP overhead is greater than benefit, for large orders (out-of-cache) memory effects play negative role so customers investigation is right there is no any threading for order 19 and above.

Thanks,
Chao

topic IPP FFT performance no improved with multiple threads in Intel® Integrated Performance Primitives

IPP FFT performance no improved with multiple threads

IPP FFT performance no improved with multiple threads