Community
cancel
Showing results for 
Search instead for 
Did you mean: 
James_K_2
Beginner
81 Views

Major slow down for inverse 2D FFT

I have been tracking down the casue of a major slow down releated to FFT and filtering using the MKL libaries

During filtering of a large 2D set of data (6427x6427) using floats (not doubles).

What I am seeing:

Forward FFT the data takes about 1.3 sec.

Multiply the output coefficient of the FFT by a weighting function

Inverse FFT the data takes about 6 sec (FFT only, not the time to apply weights).

Taking 6 seconds on the inverse FFT makes no sense, so during testing I tried the following: (This is just what I think is relevant results and not all testing I did)

After many experiments I get.

  1. Not applying the weighting of FFT output coefficients:
    1. Forward FFT the data takes about 1.3 sec.
    2. Inverse FFT the data takes about 1.3 sec. This is what I would expect.
  2. Going through motions of applying the weights, just not applying weights. (i.e. everything except finial multiply)
    1. Forward FFT the data takes about 1.3 sec.
    2. Inverse FFT the data takes about 1.3 sec. This is what I would expect.
  3. Adding two lines to #2 above which multiply the FFT coefficients by the weighting function
    1. Forward FFT the data takes about 1.3 sec.
    2. Inverse FFT the data takes about 6 sec. Why would the simple act of multiply the coefficients by the weights cause the time to change from 1.3 sec to 6 sec.
  4. When using the same code on a much smaller (1024x1024) set of data the forward and inverse FFT, with weight multiplication, are the same.

Data calls used: DftiComputeForward and DftiComputeBackward
Computer: Win7 64bit
Intel Xeon 5160 3GHz (2 Processer, 4 CPU)
Memory 32GB

0 Kudos
5 Replies
Dmitry_B_Intel
Employee
81 Views

Hi J,

I suppose you do complex-to-complex transform.
Didyou multiply the coefficients by a separate pass over the data, or by setting DFTI_BACKWARD_SCALE?
If you swap the calls of Forward and Backward (that is Backward first, then multiplication with coefficients, and then Forward), does this slows down the forward then?
Maybe, after multiplication with the coefficients some values becomes too small (denormalized) and this slows down the computation.
What version of MKL do you use?

Thanks
Dima


James_K_2
Beginner
81 Views

Yes, I do the multiplication with a separate pass, not included in timing, because I am implementing a type of band pass filter so it's not a single value.

The version of MKL is 10.3 patch 4. I am in the process of updating to latest 10.3.

Some additional information since I made the post.

Almost all the values are 0, ranging to -1. I have done more testing and it's not the multiplication, it's related to the values being used. When I use the same code but (at the last second) reset all weights to 1 I don't get the slow down.

I'm a little confused why having all values close to zero would cause an issue. If I wanted to filter out all but a few frequencies I would set almost all values to zero. This would be the same issue. I could understand a slight slow down (ie. 1.3 sec to 1.5 sec) but not the very large slow down.


Thanks very much,

Jim K.

Dmitry_B_Intel
Employee
81 Views

Jim,
Floating point operations with very small (denormalized) numbers are expensive.
You could easily see if this is the issue by setting flush denormals bit before DftiComputeBackward:

_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);

The macro is defined in
#include

Alternatively, if you use Intel compiler, flush to zero can be set by adding -Qftz flag when compiling the main() function.

Thanks
Dima

James_K_2
Beginner
81 Views

That was the cause.

Thanks,
Now I just need to figure out how the may impact the rest of the calculations in my system.

Jim K.

SergeyKostrov
Valued Contributor II
81 Views

...I'm a little confused why having all values close to zero would cause an issue...


Take a look at astate of FP operations( see _control87 CRT-function or SSE macro _MM_GET_EXCEPTION_STATE).
There is a possibility that someFP exceptions, like Inexact Value( see _EM_INEXACT ), are causing the performanceproblem.

Reply