Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
- How to find optmized FFT dimension size

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

missing__zlw

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-30-2010
09:30 PM

92 Views

How to find optmized FFT dimension size

I would like to pad my array/Matrix to get best FFT performance.

There is some guideline in the manual saying the dimension should not be multiple of 2048 (for 2D FFT) and aligned.

I can align my 1D and 2D arrays, but I wonder how do I really get the optmized size. If I would like to generate an optmized size table, how it will look like?

In FFTW, they usually use multiple of small prime numbers. Does this apply to MKL?

Please provide some guideline on how to generate a table of optimized dimension number. This apply to both 1D and 2D.

Also for 2D, the row and col will follow the same rule, right?

To summarize, if I have 1D array, say 457 numbers, and I run FFT on it. What is the optimized dimension I should pad my array to?

for 2D, if it is 457x671, what is the optmized dimension?

Thank you.

7 Replies

barragan_villanueva_

Valued Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-01-2010
12:04 AM

92 Views

For the best performance please provide 16-byte alignment of data.

As to optimized radices: MKL User Guide states (see

You can improve the performance of the Intel Math Kernel Library (Intel MKL) FFT if the length of your data vector permits factorization into powers of optimized radices.

In Intel MKL, the optimized radices are 2, 3, 5, 7, and 11.

missing__zlw

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-01-2010
10:39 AM

92 Views

I also read in the MKL manual which says:

leading dimension values (n*element_size) of two-dimensional arrays are divisible by 16

for two-dimensional arrays, leading dimension values divisible by 2048 are avoided

For the C-style FFT, the distance L between arrays that represent real and imaginary

parts is not divisible by 64. The best case is when L=k*64 + 16

Leading dimension values, in bytes (n*element_size), of two-dimensional arrays are

not power of two.

could you give some explanation here on leading dimension? If I have array like Real_1, Imag_1, Real_2, Imag_2, ... is this an optimized FFT data?

Frank_M

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-05-2010
06:25 AM

92 Views

Here is how I find the optimal buffer size for MKL fft. The number returned is your data length + symetric padding. The code does some rather unusual things on the surface. It starts its search on the small end of the scale. This is because there is no "known" way to determine if you have found the best solution except to check all the values between your best estimate and thelength of your data. I did some testing on the original (before MKL added 13 as an optimized prime) and found it took on average less then 100 iterations of the inner loop to find an odd buffer size and about 300 inner loop runs to find an even buffer size. Most likely there are faster methods or improvements that could be made to this code. If you find them please post.

John_S_18

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-06-2010
06:57 AM

92 Views

Quoting Frank_M

- FUNCTIONFGET_BUFFER_SIZE(datalen)
- IMPLICITNONE
- INTEGER::FGET_BUFFER_SIZE
- INTEGER,INTENT(IN)::datalen
- REAL(8)::ln2,ln3,ln5,ln7,ln11,ln13
- REAL(8)::lndata
- ln2=DLOG(2.0_8)
- ln3=DLOG(3.0_8)
- ln5=DLOG(5.0_8)
- ln7=DLOG(7.0_8)
- ln11=DLOG(11.0_8)
- ln13=DLOG(13.0_8)
- lndata=DLOG(DBLE(datalen))
- IF(MOD(datalen,2)>=1)THEN
- FGET_BUFFER_SIZE=ODD_BUFFER()
- ELSE
- FGET_BUFFER_SIZE=EVEN_BUFFER()
- ENDIF
- CONTAINS
- FUNCTIONODD_BUFFER()
- INTEGER::ODD_BUFFER
- INTEGER::buffsize,buffest
- REAL(8)::cnt3,cnt5,cnt7,cnt11,cnt13,N
- REAL(8)::tmp3,tmp5,tmp7,tmp11,tmp13
- buffsize=HUGE(buffsize)
- N=DLOG(DBLE(buffsize))
- cnt3=0.0_8
- cnt5=0.0_8
- cnt7=0.0_8
- cnt11=0.0_8
- cnt13=0.0_8
- DOWHILE(cnt13*ln13
- tmp13=cnt13*ln13
- DOWHILE(cnt11*ln11+tmp13
- tmp11=cnt11*ln11+tmp13
- DOWHILE(cnt7*ln7+tmp11
- tmp7=cnt7*ln7+tmp11
- DOWHILE(cnt5*ln5+tmp7
- tmp5=cnt5*ln5+tmp7
- cnt3=DNINT((lndata-tmp5)/ln3+0.45_8)
- DOWHILE(cnt3*ln3+tmp5
- tmp3=cnt3*ln3+tmp5
- buffest=NINT(DEXP(tmp3))
- IF(buffest.GE.datalen)THEN
- N=tmp3
- buffsize=buffest
- IF(buffsize==datalen)GOTO200
- ENDIF
- cnt3=cnt3+1.0_8
- ENDDO
- cnt3=0.0_8
- cnt5=cnt5+1.0_8
- ENDDO
- cnt5=0.0_8
- cnt7=cnt7+1.0_8
- ENDDO
- cnt7=0.0_8
- cnt11=cnt11+1.0_8
- ENDDO
- cnt11=0.0_8
- cnt13=cnt13+1.0_8
- ENDDO
- 200CONTINUE
- ODD_BUFFER=buffsize
- ENDFUNCTIONODD_BUFFER
- FUNCTIONEVEN_BUFFER()
- INTEGER::EVEN_BUFFER
- INTEGER::buffsize,buffest
- REAL(8)::cnt2,cnt3,cnt5,cnt7,cnt11,cnt13,N
- REAL(8)::tmp2,tmp3,tmp5,tmp7,tmp11,tmp13
- buffsize=HUGE(buffsize)
- N=DLOG(DBLE(buffsize))
- cnt3=0.0_8
- cnt5=0.0_8
- cnt7=0.0_8
- cnt11=0.0_8
- cnt13=0.0_8
- DOWHILE(cnt13*ln13
- tmp13=cnt13*ln13
- DOWHILE(ln2+cnt11*ln11+tmp13
- tmp11=ln2+cnt11*ln11+tmp13
- DOWHILE(cnt7*ln7+tmp11
- tmp7=cnt7*ln7+tmp11
- DOWHILE(cnt5*ln5+tmp7
- tmp5=cnt5*ln5+tmp7
- DOWHILE(cnt3*ln3+tmp5
- tmp3=cnt3*ln3+tmp5-ln2
- cnt2=DNINT((lndata-tmp3)/ln2+0.45_8)
- If(cnt2<1.0_8)cnt2=1.0_8
- DOWHILE(cnt2*ln2+tmp3
- tmp2=cnt2*ln2+tmp3
- buffest=NINT(DEXP(tmp2))
- IF(buffest.GE.datalen)THEN
- N=tmp2
- buffsize=buffest
- IF(buffsize==datalen)GOTO300
- ENDIF
- cnt2=cnt2+1.0_8
- ENDDO
- cnt2=1.0_8
- cnt3=cnt3+1.0_8
- ENDDO
- cnt3=0.0_8
- cnt5=cnt5+1.0_8
- ENDDO
- cnt5=0.0_8
- cnt7=cnt7+1.0_8
- ENDDO
- cnt7=0.0_8
- cnt11=cnt11+1.0_8
- ENDDO
- cnt11=0.0_8
- cnt13=cnt13+1.0_8
- ENDDO
- 300CONTINUE
- EVEN_BUFFER=buffsize
- ENDFUNCTIONEVEN_BUFFER
- ENDFUNCTIONFGET_BUFFER_SIZE

Here is how I find the optimal buffer size for MKL fft. The number returned is your data length + symetric padding. The code does some rather unusual things on the surface. It starts its search on the small end of the scale. This is because there is no "known" way to determine if you have found the best solution except to check all the values between your best estimate and thelength of your data. I did some testing on the original (before MKL added 13 as an optimized prime) and found it took on average less then 100 iterations of the inner loop to find an odd buffer size and about 300 inner loop runs to find an even buffer size. Most likely there are faster methods or improvements that could be made to this code. If you find them please post.

Thanks much,

Ozzer

Frank_M

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-06-2010
08:37 PM

92 Views

dated 12 August 2010

Intel Visual Fortran Compiler Professional Edition 11.1 for Windows

Installation Guide and Release Notes

Performance improvements

FFTS

Enhanced performance for transforms which are multiple of 8 of 13

Optimized 1d complex cluster FFTs for non-power-of-2 vector lengths

John_S_18

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-07-2010
02:12 PM

92 Views

-Ozzer

Dmitry_B_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-14-2010
07:17 AM

92 Views

There are some considerations about selecting optimized dimensions and data layout for multidimensional transforms recently published in MKL Knowledge Base article FFT length and layout advisor. This may not be exactly what you are looking for, but it may help.

Thanks

Dima

For more complete information about compiler optimizations, see our Optimization Notice.