Greetings everybody. I'm experiencing performance problems with MKL's FFT, but only on OS X.
I have a C++ project in use on both Windows and OS X. Initially, I had built it to use FFTW, but several months ago, I switched to using MKL for the FFT calculations on the Windows build, still using the FFTW3 interface. On Windows with a Core 2 Duo processor, the performance difference between MKL and FFTW is within 10%, which is fine. I purchased a license for MKL on Windows and have been happy.
However, when the same code is built on OS X, using the eval versions of ICC and MKL, FFTs using MKL are about 2.2 times slower than when linked to FFTW (again using an Intel processor, this time an i7). For example, when linked to FFTW, execution time is 6 seconds, and changing the linking to MKL (with no other changes), execution time is 14 seconds.
I've tried various combinations of static linking, dynamic linking, enabling threading in MKL, setting MKL to sequential mode, and so on. (The project uses pthreads for its threading, if that matters.) The only thing that seems to make a difference is the compiler. If I use LLVM instead of ICC to build, overall performance is about 20% worse, but the difference between MKL and FFTW remains.
Anybody have any ideas why MKL's FFT would be so much slower on OS X?
Sure. The CPU is an i7-2677M at 1.8 GHz, the version of OS X is 10.7.2, the version of MKL is 10.3.11, and a representative FFT problem is: single-precision, complex, rank-2, in-place, 3000x2000 elements.