I rebuilt FFTW (3.3.3) with the native C++ compiler supplied with MSDEV 2010.
Then I did the same with Intel Compiler (Intel Parallel Studio XE 2013).
I ran 20 iterations of 1D FFT on a complax float vector (1024 elements).
The performance in both cases was the same.
Does it make sense ? I expected compilation with Intel Compiler will cause a much faster code.
In both cases I used the release version of the FFTW dll.
Rather than putting in compiler optimizations specifically useful for FFT, Intel compilers provide the comprehensive MKL FFT library. You might check with /Qvec-report to see if important loops are vectorized, and employ restrict qualifier and pragmas if not.
It's important to verify that the source code is standard-compliant with respect to aliasing and set /Qansi-alias. Microsoft compiler takes a middle position, trying to protect against incorrect aliasing but also performing some optimization (presumably with run-time checks).