- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I have been tracking down the casue of a major slow down releated to FFT and filtering using the MKL libaries

During filtering of a large 2D set of data (6427x6427) using floats (not doubles).

What I am seeing:

Forward FFT the data takes about 1.3 sec.

Multiply the output coefficient of the FFT by a weighting function

Inverse FFT the data takes about 6 sec (FFT only, not the time to apply weights).

Taking 6 seconds on the inverse FFT makes no sense, so during testing I tried the following: (This is just what I think is relevant results and not all testing I did)

After many experiments I get.

- Not applying the weighting of FFT output coefficients:
- Forward FFT the data takes about 1.3 sec.
- Inverse FFT the data takes about 1.3 sec. This is what I would expect.

- Going through motions of applying the weights, just not applying weights. (i.e. everything except finial multiply)
- Forward FFT the data takes about 1.3 sec.
- Inverse FFT the data takes about 1.3 sec. This is what I would expect.

- Adding two lines to #2 above which multiply the FFT coefficients by the weighting function
- Forward FFT the data takes about 1.3 sec.
- Inverse FFT the data takes about 6 sec. Why would the simple act of multiply the coefficients by the weights cause the time to change from 1.3 sec to 6 sec.

- When using
*the same code*on a much smaller (1024x1024) set of data the forward and inverse FFT, with weight multiplication, are the same.

Data calls used: DftiComputeForward and DftiComputeBackward

Computer: Win7 64bit

Intel Xeon 5160 3GHz (2 Processer, 4 CPU)

Memory 32GB

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I suppose you do complex-to-complex transform.

Didyou multiply the coefficients by a separate pass over the data, or by setting DFTI_BACKWARD_SCALE?

If you swap the calls of Forward and Backward (that is Backward first, then multiplication with coefficients, and then Forward), does this slows down the forward then?

Maybe, after multiplication with the coefficients some values becomes too small (denormalized) and this slows down the computation.

What version of MKL do you use?

Thanks

Dima

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Yes, I do the multiplication with a separate pass, not included in timing, because I am implementing a type of band pass filter so it's not a single value.

The version of MKL is 10.3 patch 4. I am in the process of updating to latest 10.3.

Some additional information since I made the post.

Almost all the values are 0, ranging to -1. I have done more testing and it's not the multiplication, it's related to the values being used. When I use the same code but (at the last second) reset all weights to 1 I don't get the slow down.

I'm a little confused why having all values close to zero would cause an issue. If I wanted to filter out all but a few frequencies I would set almost all values to zero. This would be the same issue. I could understand a slight slow down (ie. 1.3 sec to 1.5 sec) but not the very large slow down.

Thanks very much,

Jim K.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Jim,

Floating point operations with very small (denormalized) numbers are expensive.

You could easily see if this is the issue by setting flush denormals bit before DftiComputeBackward:

_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);

The macro is defined in

#include

Alternatively, if you use Intel compiler, flush to zero can be set by adding -Qftz flag when compiling the main() function.

Thanks

Dima

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

That was the cause.

Thanks,

Now I just need to figure out how the may impact the rest of the calculations in my system.

Jim K.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

*...I'm a little confused why having all values close to zero would cause an issue...*

Take a look at astate of **FP** operations( see **_control87** CRT-function or **SSE** macro **_MM_GET_EXCEPTION_STATE**).

There is a possibility that some**FP** exceptions, like Inexact Value( see **_EM_INEXACT** ), are causing the performanceproblem.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page