Community
cancel
Showing results for
Did you mean:
Beginner
92 Views

## How to find optmized FFT dimension size

I am using MKL DFTI to do FFT on 1D and 2D.
I would like to pad my array/Matrix to get best FFT performance.
There is some guideline in the manual saying the dimension should not be multiple of 2048 (for 2D FFT) and aligned.
I can align my 1D and 2D arrays, but I wonder how do I really get the optmized size. If I would like to generate an optmized size table, how it will look like?

In FFTW, they usually use multiple of small prime numbers. Does this apply to MKL?

Please provide some guideline on how to generate a table of optimized dimension number. This apply to both 1D and 2D.

Also for 2D, the row and col will follow the same rule, right?

To summarize, if I have 1D array, say 457 numbers, and I run FFT on it. What is the optimized dimension I should pad my array to?

for 2D, if it is 457x671, what is the optmized dimension?

Thank you.
7 Replies
Valued Contributor I
92 Views
Hi,

For the best performance please provide 16-byte alignment of data.

As to optimized radices: MKL User Guide states (seeTips and Techniques to Improve Performance)

You can improve the performance of the Intel Math Kernel Library (Intel MKL) FFT if the length of your data vector permits factorization into powers of optimized radices.

In Intel MKL, the optimized radices are 2, 3, 5, 7, and 11.

Beginner
92 Views
Thanks for the info.
I also read in the MKL manual which says:

leading dimension values (n*element_size) of two-dimensional arrays are divisible by 16

for two-dimensional arrays, leading dimension values divisible by 2048 are avoided

For the C-style FFT, the distance L between arrays that represent real and imaginary

parts is not divisible by 64. The best case is when L=k*64 + 16

Leading dimension values, in bytes (n*element_size), of two-dimensional arrays are

not power of two.

could you give some explanation here on leading dimension? If I have array like Real_1, Imag_1, Real_2, Imag_2, ... is this an optimized FFT data?

Beginner
92 Views
```[fortran]FUNCTION FGET_BUFFER_SIZE(datalen)

IMPLICIT NONE

INTEGER :: FGET_BUFFER_SIZE
INTEGER, INTENT(IN) :: datalen
REAL(8) :: ln2, ln3, ln5, ln7, ln11, ln13
REAL(8) :: lndata

ln2 = DLOG(2.0_8)
ln3 = DLOG(3.0_8)
ln5 = DLOG(5.0_8)
ln7 = DLOG(7.0_8)
ln11 = DLOG(11.0_8)
ln13 = DLOG(13.0_8)
lndata = DLOG(DBLE(datalen))

IF (MOD(datalen,2) >= 1) THEN
FGET_BUFFER_SIZE = ODD_BUFFER()
ELSE
FGET_BUFFER_SIZE = EVEN_BUFFER()
END IF

CONTAINS

FUNCTION ODD_BUFFER()
INTEGER :: ODD_BUFFER
INTEGER :: buffsize, buffest
REAL(8) :: cnt3 ,cnt5, cnt7, cnt11, cnt13, N
REAL(8) :: tmp3, tmp5, tmp7, tmp11, tmp13

buffsize = HUGE(buffsize)
N = DLOG(DBLE(buffsize))
cnt3 = 0.0_8
cnt5 = 0.0_8
cnt7 = 0.0_8
cnt11 = 0.0_8
cnt13 = 0.0_8
DO WHILE(cnt13*ln13Here is how I find the optimal buffer size for MKL fft. The number returned is your data length + symetric padding. The code does some rather unusual things on the surface. It starts its search on the small end of the scale. This is because there is no "known" way to determine if you have found the best solution except to check all the values between your best estimate and thelength of your data. I did some testing on the original (before MKL added 13 as an optimized prime) and found it took on average less then 100 iterations of the inner loop to find an odd buffer size and about 300 inner loop runs to find an even buffer size. Most likely there are faster methods or improvements that could be made to this code. If you find them please post.```
Beginner
92 Views
Quoting Frank_M
1. FUNCTIONFGET_BUFFER_SIZE(datalen)
2. IMPLICITNONE
3. INTEGER::FGET_BUFFER_SIZE
4. INTEGER,INTENT(IN)::datalen
5. REAL(8)::ln2,ln3,ln5,ln7,ln11,ln13
6. REAL(8)::lndata
7. ln2=DLOG(2.0_8)
8. ln3=DLOG(3.0_8)
9. ln5=DLOG(5.0_8)
10. ln7=DLOG(7.0_8)
11. ln11=DLOG(11.0_8)
12. ln13=DLOG(13.0_8)
13. lndata=DLOG(DBLE(datalen))
14. IF(MOD(datalen,2)>=1)THEN
15. FGET_BUFFER_SIZE=ODD_BUFFER()
16. ELSE
17. FGET_BUFFER_SIZE=EVEN_BUFFER()
18. ENDIF
19. CONTAINS
20. FUNCTIONODD_BUFFER()
21. INTEGER::ODD_BUFFER
22. INTEGER::buffsize,buffest
23. REAL(8)::cnt3,cnt5,cnt7,cnt11,cnt13,N
24. REAL(8)::tmp3,tmp5,tmp7,tmp11,tmp13
25. buffsize=HUGE(buffsize)
26. N=DLOG(DBLE(buffsize))
27. cnt3=0.0_8
28. cnt5=0.0_8
29. cnt7=0.0_8
30. cnt11=0.0_8
31. cnt13=0.0_8
32. DOWHILE(cnt13*ln13
33. tmp13=cnt13*ln13
34. DOWHILE(cnt11*ln11+tmp13
35. tmp11=cnt11*ln11+tmp13
36. DOWHILE(cnt7*ln7+tmp11
37. tmp7=cnt7*ln7+tmp11
38. DOWHILE(cnt5*ln5+tmp7
39. tmp5=cnt5*ln5+tmp7
40. cnt3=DNINT((lndata-tmp5)/ln3+0.45_8)
41. DOWHILE(cnt3*ln3+tmp5
42. tmp3=cnt3*ln3+tmp5
43. buffest=NINT(DEXP(tmp3))
44. IF(buffest.GE.datalen)THEN
45. N=tmp3
46. buffsize=buffest
47. IF(buffsize==datalen)GOTO200
48. ENDIF
49. cnt3=cnt3+1.0_8
50. ENDDO
51. cnt3=0.0_8
52. cnt5=cnt5+1.0_8
53. ENDDO
54. cnt5=0.0_8
55. cnt7=cnt7+1.0_8
56. ENDDO
57. cnt7=0.0_8
58. cnt11=cnt11+1.0_8
59. ENDDO
60. cnt11=0.0_8
61. cnt13=cnt13+1.0_8
62. ENDDO
63. 200CONTINUE
64. ODD_BUFFER=buffsize
65. ENDFUNCTIONODD_BUFFER
66. FUNCTIONEVEN_BUFFER()
67. INTEGER::EVEN_BUFFER
68. INTEGER::buffsize,buffest
69. REAL(8)::cnt2,cnt3,cnt5,cnt7,cnt11,cnt13,N
70. REAL(8)::tmp2,tmp3,tmp5,tmp7,tmp11,tmp13
71. buffsize=HUGE(buffsize)
72. N=DLOG(DBLE(buffsize))
73. cnt3=0.0_8
74. cnt5=0.0_8
75. cnt7=0.0_8
76. cnt11=0.0_8
77. cnt13=0.0_8
78. DOWHILE(cnt13*ln13
79. tmp13=cnt13*ln13
80. DOWHILE(ln2+cnt11*ln11+tmp13
81. tmp11=ln2+cnt11*ln11+tmp13
82. DOWHILE(cnt7*ln7+tmp11
83. tmp7=cnt7*ln7+tmp11
84. DOWHILE(cnt5*ln5+tmp7
85. tmp5=cnt5*ln5+tmp7
86. DOWHILE(cnt3*ln3+tmp5
87. tmp3=cnt3*ln3+tmp5-ln2
88. cnt2=DNINT((lndata-tmp3)/ln2+0.45_8)
89. If(cnt2<1.0_8)cnt2=1.0_8
90. DOWHILE(cnt2*ln2+tmp3
91. tmp2=cnt2*ln2+tmp3
92. buffest=NINT(DEXP(tmp2))
93. IF(buffest.GE.datalen)THEN
94. N=tmp2
95. buffsize=buffest
96. IF(buffsize==datalen)GOTO300
97. ENDIF
98. cnt2=cnt2+1.0_8
99. ENDDO
100. cnt2=1.0_8
101. cnt3=cnt3+1.0_8
102. ENDDO
103. cnt3=0.0_8
104. cnt5=cnt5+1.0_8
105. ENDDO
106. cnt5=0.0_8
107. cnt7=cnt7+1.0_8
108. ENDDO
109. cnt7=0.0_8
110. cnt11=cnt11+1.0_8
111. ENDDO
112. cnt11=0.0_8
113. cnt13=cnt13+1.0_8
114. ENDDO
115. 300CONTINUE
116. EVEN_BUFFER=buffsize
117. ENDFUNCTIONEVEN_BUFFER
118. ENDFUNCTIONFGET_BUFFER_SIZE
```[fortran]FUNCTION FGET_BUFFER_SIZE(datalen)

IMPLICIT NONE

INTEGER :: FGET_BUFFER_SIZE
INTEGER, INTENT(IN) :: datalen
REAL(8) :: ln2, ln3, ln5, ln7, ln11, ln13
REAL(8) :: lndata

ln2 = DLOG(2.0_8)
ln3 = DLOG(3.0_8)
ln5 = DLOG(5.0_8)
ln7 = DLOG(7.0_8)
ln11 = DLOG(11.0_8)
ln13 = DLOG(13.0_8)
lndata = DLOG(DBLE(datalen))

IF (MOD(datalen,2) >= 1) THEN
FGET_BUFFER_SIZE = ODD_BUFFER()
ELSE
FGET_BUFFER_SIZE = EVEN_BUFFER()
END IF

CONTAINS

FUNCTION ODD_BUFFER()
INTEGER :: ODD_BUFFER
INTEGER :: buffsize, buffest
REAL(8) :: cnt3 ,cnt5, cnt7, cnt11, cnt13, N
REAL(8) :: tmp3, tmp5, tmp7, tmp11, tmp13

buffsize = HUGE(buffsize)
N = DLOG(DBLE(buffsize))
cnt3 = 0.0_8
cnt5 = 0.0_8
cnt7 = 0.0_8
cnt11 = 0.0_8
cnt13 = 0.0_8
DO WHILE(cnt13*ln13Here is how I find the optimal buffer size for MKL fft. The number returned is your data length + symetric padding. The code does some rather unusual things on the surface. It starts its search on the small end of the scale. This is because there is no "known" way to determine if you have found the best solution except to check all the values between your best estimate and thelength of your data. I did some testing on the original (before MKL added 13 as an optimized prime) and found it took on average less then 100 iterations of the inner loop to find an odd buffer size and about 300 inner loop runs to find an even buffer size. Most likely there are faster methods or improvements that could be made to this code. If you find them please post.```

@Frank_M: can you clarify why you used a radix value of 13, when the MKL recommendation has a maximum radix of 11? You refer to Intel expanding it to 13; in which version of MKL? I can't seem to find documentation for this change.

Thanks much,
Ozzer
Beginner
92 Views
from page 22 of Document 321417-002US
dated 12 August 2010

Intel Visual Fortran Compiler Professional Edition 11.1 for Windows
Installation Guide and Release Notes

Performance improvements

FFTS
Enhanced performance for transforms which are multiple of 8 of 13

Optimized 1d complex cluster FFTs for non-power-of-2 vector lengths

Beginner
92 Views
@Frank_M -- Thanks much. With a little sleuthing, I tracked this to the Release Notes associated with MKL Release 10.2, Update 4.

-Ozzer
Employee
92 Views
Hi,

There are some considerations about selecting optimized dimensions and data layout for multidimensional transforms recently published in MKL Knowledge Base article FFT length and layout advisor. This may not be exactly what you are looking for, but it may help.

Thanks
Dima