Major slow down for inverse 2D FFT

James_K_2 · ‎03-13-2012

I have been tracking down the casue of a major slow down releated to FFT and filtering using the MKL libaries

During filtering of a large 2D set of data (6427x6427) using floats (not doubles).

What I am seeing:

Forward FFT the data takes about 1.3 sec.

Multiply the output coefficient of the FFT by a weighting function

Inverse FFT the data takes about 6 sec (FFT only, not the time to apply weights).

Taking 6 seconds on the inverse FFT makes no sense, so during testing I tried the following: (This is just what I think is relevant results and not all testing I did)

After many experiments I get.

Not applying the weighting of FFT output coefficients:
1. Forward FFT the data takes about 1.3 sec.
2. Inverse FFT the data takes about 1.3 sec. This is what I would expect.
Going through motions of applying the weights, just not applying weights. (i.e. everything except finial multiply)
1. Forward FFT the data takes about 1.3 sec.
2. Inverse FFT the data takes about 1.3 sec. This is what I would expect.
Adding two lines to #2 above which multiply the FFT coefficients by the weighting function
1. Forward FFT the data takes about 1.3 sec.
2. Inverse FFT the data takes about 6 sec. Why would the simple act of multiply the coefficients by the weights cause the time to change from 1.3 sec to 6 sec.
When using the same code on a much smaller (1024x1024) set of data the forward and inverse FFT, with weight multiplication, are the same.

Data calls used: DftiComputeForward and DftiComputeBackward
Computer: Win7 64bit
Intel Xeon 5160 3GHz (2 Processer, 4 CPU)
Memory 32GB

Dmitry_B_Intel · ‎03-13-2012

Hi J,

I suppose you do complex-to-complex transform.
Didyou multiply the coefficients by a separate pass over the data, or by setting DFTI_BACKWARD_SCALE?
If you swap the calls of Forward and Backward (that is Backward first, then multiplication with coefficients, and then Forward), does this slows down the forward then?
Maybe, after multiplication with the coefficients some values becomes too small (denormalized) and this slows down the computation.
What version of MKL do you use?

Thanks
Dima

James_K_2 · ‎03-14-2012

Yes, I do the multiplication with a separate pass, not included in timing, because I am implementing a type of band pass filter so it's not a single value.

The version of MKL is 10.3 patch 4. I am in the process of updating to latest 10.3.

Some additional information since I made the post.

Almost all the values are 0, ranging to -1. I have done more testing and it's not the multiplication, it's related to the values being used. When I use the same code but (at the last second) reset all weights to 1 I don't get the slow down.

I'm a little confused why having all values close to zero would cause an issue. If I wanted to filter out all but a few frequencies I would set almost all values to zero. This would be the same issue. I could understand a slight slow down (ie. 1.3 sec to 1.5 sec) but not the very large slow down.

Thanks very much,

Jim K.

Dmitry_B_Intel · ‎03-14-2012

Jim,
Floating point operations with very small (denormalized) numbers are expensive.
You could easily see if this is the issue by setting flush denormals bit before DftiComputeBackward:

_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);

The macro is defined in
#include

Alternatively, if you use Intel compiler, flush to zero can be set by adding -Qftz flag when compiling the main() function.

Thanks
Dima

James_K_2 · ‎03-14-2012

That was the cause.

Thanks,
Now I just need to figure out how the may impact the rest of the calculations in my system.

Jim K.

SergeyKostrov · ‎03-15-2012

Quoting [email protected]

...I'm a little confused why having all values close to zero would cause an issue...

Take a look at astate of FP operations( see _control87 CRT-function or SSE macro _MM_GET_EXCEPTION_STATE).
There is a possibility that someFP exceptions, like Inexact Value( see _EM_INEXACT ), are causing the performanceproblem.