I have been tracking down the casue of a major slow down releated to FFT and filtering using the MKL libaries
During filtering of a large 2D set of data (6427x6427) using floats (not doubles).
What I am seeing:
Forward FFT the data takes about 1.3 sec.
Multiply the output coefficient of the FFT by a weighting function
Inverse FFT the data takes about 6 sec (FFT only, not the time to apply weights).
Taking 6 seconds on the inverse FFT makes no sense, so during testing I tried the following: (This is just what I think is relevant results and not all testing I did)
After many experiments I get.
Data calls used: DftiComputeForward and DftiComputeBackward
Computer: Win7 64bit
Intel Xeon 5160 3GHz (2 Processer, 4 CPU)
Yes, I do the multiplication with a separate pass, not included in timing, because I am implementing a type of band pass filter so it's not a single value.
The version of MKL is 10.3 patch 4. I am in the process of updating to latest 10.3.
Some additional information since I made the post.
Almost all the values are 0, ranging to -1. I have done more testing and it's not the multiplication, it's related to the values being used. When I use the same code but (at the last second) reset all weights to 1 I don't get the slow down.
I'm a little confused why having all values close to zero would cause an issue. If I wanted to filter out all but a few frequencies I would set almost all values to zero. This would be the same issue. I could understand a slight slow down (ie. 1.3 sec to 1.5 sec) but not the very large slow down.
Thanks very much,
Floating point operations with very small (denormalized) numbers are expensive.
You could easily see if this is the issue by setting flush denormals bit before DftiComputeBackward:
The macro is defined in
Alternatively, if you use Intel compiler, flush to zero can be set by adding -Qftz flag when compiling the main() function.
Take a look at astate of FP operations( see _control87 CRT-function or SSE macro _MM_GET_EXCEPTION_STATE).
There is a possibility that someFP exceptions, like Inexact Value( see _EM_INEXACT ), are causing the performanceproblem.