Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
6670 Discussions

IPP FFT performance no improved with multiple threads

arkgr
Beginner
128 Views

I have the problem with FFT (IPP ver 7.0), ippsFFTFwd_CToC_32fc. The FFT len 2^19. According to ThreadedFunctionsList.txt, "ippsFFTFwd_CToC_32fc" is threaded.

I run it on 12 cores machine (L5640 2x6),through Parallel Studio, Visual Studio 2010 under Windows Server 2008, 64bit.

And see that only one core is working. And I did all that wroted in doc.

For instance, Direct FIR function is very good parallelized.

Can you help me with FFT ?

0 Kudos
1 Reply
Chao_Y_Intel
Employee
128 Views

Hello,

This looks a problem we discussed in the forum before. Please find some comments from the function expert on the performance:

1)FFT function uses memory buffer ~equal to vector length for rather small FFT orders ( < ~19 depends on platform (cache size)) therefore for such orders there is no difference between in-place and out-of-place cases performance FFT is calculated in the buffer and then result is copied to the destination so for in-cache cases it doesnt matter where to copy to src or to dst vector. For rather large orders (>19) in-place version is faster as internally FFT uses buffer of smaller size (less than input vector length). I think that HDD case should not be discussed here

2) FFT is threaded for fit into shared L2 cases only and for Core2 CPUs only (and on 2 threads only). For small orders OMP overhead is greater than benefit, for large orders (out-of-cache) memory effects play negative role so customers investigation is right there is no any threading for order 19 and above.

Thanks,
Chao

Reply