Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6956 Discussions

How to find optmized FFT dimension size

missing__zlw
Beginner
968 Views
I am using MKL DFTI to do FFT on 1D and 2D.
I would like to pad my array/Matrix to get best FFT performance.
There is some guideline in the manual saying the dimension should not be multiple of 2048 (for 2D FFT) and aligned.
I can align my 1D and 2D arrays, but I wonder how do I really get the optmized size. If I would like to generate an optmized size table, how it will look like?

In FFTW, they usually use multiple of small prime numbers. Does this apply to MKL?

Please provide some guideline on how to generate a table of optimized dimension number. This apply to both 1D and 2D.

Also for 2D, the row and col will follow the same rule, right?

To summarize, if I have 1D array, say 457 numbers, and I run FFT on it. What is the optimized dimension I should pad my array to?

for 2D, if it is 457x671, what is the optmized dimension?

Thank you.
0 Kudos
7 Replies
barragan_villanueva_
Valued Contributor I
968 Views
Hi,

For the best performance please provide 16-byte alignment of data.

As to optimized radices: MKL User Guide states (seeTips and Techniques to Improve Performance)

FFT Optimized Radices

You can improve the performance of the Intel Math Kernel Library (Intel MKL) FFT if the length of your data vector permits factorization into powers of optimized radices.

In Intel MKL, the optimized radices are 2, 3, 5, 7, and 11.

0 Kudos
missing__zlw
Beginner
968 Views
Thanks for the info.
I also read in the MKL manual which says:


leading dimension values (n*element_size) of two-dimensional arrays are divisible by 16

for two-dimensional arrays, leading dimension values divisible by 2048 are avoided

For the C-style FFT, the distance L between arrays that represent real and imaginary

parts is not divisible by 64. The best case is when L=k*64 + 16

Leading dimension values, in bytes (n*element_size), of two-dimensional arrays are

not power of two.

could you give some explanation here on leading dimension? If I have array like Real_1, Imag_1, Real_2, Imag_2, ... is this an optimized FFT data?

0 Kudos
Frank_M
Beginner
968 Views
[fortran]FUNCTION FGET_BUFFER_SIZE(datalen)

    IMPLICIT NONE
   
    INTEGER :: FGET_BUFFER_SIZE
    INTEGER, INTENT(IN) :: datalen
    REAL(8) :: ln2, ln3, ln5, ln7, ln11, ln13
    REAL(8) :: lndata   

    ln2 = DLOG(2.0_8)
    ln3 = DLOG(3.0_8)
    ln5 = DLOG(5.0_8)
    ln7 = DLOG(7.0_8)
    ln11 = DLOG(11.0_8)
    ln13 = DLOG(13.0_8)
    lndata = DLOG(DBLE(datalen))
    
    IF (MOD(datalen,2) >= 1) THEN 
        FGET_BUFFER_SIZE = ODD_BUFFER()
    ELSE
        FGET_BUFFER_SIZE = EVEN_BUFFER()
    END IF
    
CONTAINS

FUNCTION ODD_BUFFER()
    INTEGER :: ODD_BUFFER
    INTEGER :: buffsize, buffest
    REAL(8) :: cnt3 ,cnt5, cnt7, cnt11, cnt13, N
    REAL(8) :: tmp3, tmp5, tmp7, tmp11, tmp13
    
    buffsize = HUGE(buffsize)
    N = DLOG(DBLE(buffsize))
    cnt3 = 0.0_8
    cnt5 = 0.0_8
    cnt7 = 0.0_8
    cnt11 = 0.0_8 
    cnt13 = 0.0_8
    DO WHILE(cnt13*ln13Here is how I find the optimal buffer size for MKL fft. The number returned is your data length + symetric padding. The code does some rather unusual things on the surface. It starts its search on the small end of the scale. This is because there is no "known" way to determine if you have found the best solution except to check all the values between your best estimate and thelength of your data. I did some testing on the original (before MKL added 13 as an optimized prime) and found it took on average less then 100 iterations of the inner loop to find an odd buffer size and about 300 inner loop runs to find an even buffer size. Most likely there are faster methods or improvements that could be made to this code. If you find them please post.


0 Kudos
John_S_18
Beginner
968 Views
Quoting Frank_M
  1. FUNCTIONFGET_BUFFER_SIZE(datalen)
  2. IMPLICITNONE
  3. INTEGER::FGET_BUFFER_SIZE
  4. INTEGER,INTENT(IN)::datalen
  5. REAL(8)::ln2,ln3,ln5,ln7,ln11,ln13
  6. REAL(8)::lndata
  7. ln2=DLOG(2.0_8)
  8. ln3=DLOG(3.0_8)
  9. ln5=DLOG(5.0_8)
  10. ln7=DLOG(7.0_8)
  11. ln11=DLOG(11.0_8)
  12. ln13=DLOG(13.0_8)
  13. lndata=DLOG(DBLE(datalen))
  14. IF(MOD(datalen,2)>=1)THEN
  15. FGET_BUFFER_SIZE=ODD_BUFFER()
  16. ELSE
  17. FGET_BUFFER_SIZE=EVEN_BUFFER()
  18. ENDIF
  19. CONTAINS
  20. FUNCTIONODD_BUFFER()
  21. INTEGER::ODD_BUFFER
  22. INTEGER::buffsize,buffest
  23. REAL(8)::cnt3,cnt5,cnt7,cnt11,cnt13,N
  24. REAL(8)::tmp3,tmp5,tmp7,tmp11,tmp13
  25. buffsize=HUGE(buffsize)
  26. N=DLOG(DBLE(buffsize))
  27. cnt3=0.0_8
  28. cnt5=0.0_8
  29. cnt7=0.0_8
  30. cnt11=0.0_8
  31. cnt13=0.0_8
  32. DOWHILE(cnt13*ln13
  33. tmp13=cnt13*ln13
  34. DOWHILE(cnt11*ln11+tmp13
  35. tmp11=cnt11*ln11+tmp13
  36. DOWHILE(cnt7*ln7+tmp11
  37. tmp7=cnt7*ln7+tmp11
  38. DOWHILE(cnt5*ln5+tmp7
  39. tmp5=cnt5*ln5+tmp7
  40. cnt3=DNINT((lndata-tmp5)/ln3+0.45_8)
  41. DOWHILE(cnt3*ln3+tmp5
  42. tmp3=cnt3*ln3+tmp5
  43. buffest=NINT(DEXP(tmp3))
  44. IF(buffest.GE.datalen)THEN
  45. N=tmp3
  46. buffsize=buffest
  47. IF(buffsize==datalen)GOTO200
  48. ENDIF
  49. cnt3=cnt3+1.0_8
  50. ENDDO
  51. cnt3=0.0_8
  52. cnt5=cnt5+1.0_8
  53. ENDDO
  54. cnt5=0.0_8
  55. cnt7=cnt7+1.0_8
  56. ENDDO
  57. cnt7=0.0_8
  58. cnt11=cnt11+1.0_8
  59. ENDDO
  60. cnt11=0.0_8
  61. cnt13=cnt13+1.0_8
  62. ENDDO
  63. 200CONTINUE
  64. ODD_BUFFER=buffsize
  65. ENDFUNCTIONODD_BUFFER
  66. FUNCTIONEVEN_BUFFER()
  67. INTEGER::EVEN_BUFFER
  68. INTEGER::buffsize,buffest
  69. REAL(8)::cnt2,cnt3,cnt5,cnt7,cnt11,cnt13,N
  70. REAL(8)::tmp2,tmp3,tmp5,tmp7,tmp11,tmp13
  71. buffsize=HUGE(buffsize)
  72. N=DLOG(DBLE(buffsize))
  73. cnt3=0.0_8
  74. cnt5=0.0_8
  75. cnt7=0.0_8
  76. cnt11=0.0_8
  77. cnt13=0.0_8
  78. DOWHILE(cnt13*ln13
  79. tmp13=cnt13*ln13
  80. DOWHILE(ln2+cnt11*ln11+tmp13
  81. tmp11=ln2+cnt11*ln11+tmp13
  82. DOWHILE(cnt7*ln7+tmp11
  83. tmp7=cnt7*ln7+tmp11
  84. DOWHILE(cnt5*ln5+tmp7
  85. tmp5=cnt5*ln5+tmp7
  86. DOWHILE(cnt3*ln3+tmp5
  87. tmp3=cnt3*ln3+tmp5-ln2
  88. cnt2=DNINT((lndata-tmp3)/ln2+0.45_8)
  89. If(cnt2<1.0_8)cnt2=1.0_8
  90. DOWHILE(cnt2*ln2+tmp3
  91. tmp2=cnt2*ln2+tmp3
  92. buffest=NINT(DEXP(tmp2))
  93. IF(buffest.GE.datalen)THEN
  94. N=tmp2
  95. buffsize=buffest
  96. IF(buffsize==datalen)GOTO300
  97. ENDIF
  98. cnt2=cnt2+1.0_8
  99. ENDDO
  100. cnt2=1.0_8
  101. cnt3=cnt3+1.0_8
  102. ENDDO
  103. cnt3=0.0_8
  104. cnt5=cnt5+1.0_8
  105. ENDDO
  106. cnt5=0.0_8
  107. cnt7=cnt7+1.0_8
  108. ENDDO
  109. cnt7=0.0_8
  110. cnt11=cnt11+1.0_8
  111. ENDDO
  112. cnt11=0.0_8
  113. cnt13=cnt13+1.0_8
  114. ENDDO
  115. 300CONTINUE
  116. EVEN_BUFFER=buffsize
  117. ENDFUNCTIONEVEN_BUFFER
  118. ENDFUNCTIONFGET_BUFFER_SIZE
[fortran]FUNCTION FGET_BUFFER_SIZE(datalen)

    IMPLICIT NONE
   
    INTEGER :: FGET_BUFFER_SIZE
    INTEGER, INTENT(IN) :: datalen
    REAL(8) :: ln2, ln3, ln5, ln7, ln11, ln13
    REAL(8) :: lndata   

    ln2 = DLOG(2.0_8)
    ln3 = DLOG(3.0_8)
    ln5 = DLOG(5.0_8)
    ln7 = DLOG(7.0_8)
    ln11 = DLOG(11.0_8)
    ln13 = DLOG(13.0_8)
    lndata = DLOG(DBLE(datalen))
    
    IF (MOD(datalen,2) >= 1) THEN 
        FGET_BUFFER_SIZE = ODD_BUFFER()
    ELSE
        FGET_BUFFER_SIZE = EVEN_BUFFER()
    END IF
    
CONTAINS

FUNCTION ODD_BUFFER()
    INTEGER :: ODD_BUFFER
    INTEGER :: buffsize, buffest
    REAL(8) :: cnt3 ,cnt5, cnt7, cnt11, cnt13, N
    REAL(8) :: tmp3, tmp5, tmp7, tmp11, tmp13
    
    buffsize = HUGE(buffsize)
    N = DLOG(DBLE(buffsize))
    cnt3 = 0.0_8
    cnt5 = 0.0_8
    cnt7 = 0.0_8
    cnt11 = 0.0_8 
    cnt13 = 0.0_8
    DO WHILE(cnt13*ln13Here is how I find the optimal buffer size for MKL fft. The number returned is your data length + symetric padding. The code does some rather unusual things on the surface. It starts its search on the small end of the scale. This is because there is no "known" way to determine if you have found the best solution except to check all the values between your best estimate and thelength of your data. I did some testing on the original (before MKL added 13 as an optimized prime) and found it took on average less then 100 iterations of the inner loop to find an odd buffer size and about 300 inner loop runs to find an even buffer size. Most likely there are faster methods or improvements that could be made to this code. If you find them please post.


@Frank_M: can you clarify why you used a radix value of 13, when the MKL recommendation has a maximum radix of 11? You refer to Intel expanding it to 13; in which version of MKL? I can't seem to find documentation for this change.

Thanks much,
Ozzer
0 Kudos
Frank_M
Beginner
968 Views
from page 22 of Document 321417-002US
dated 12 August 2010

Intel Visual Fortran Compiler Professional Edition 11.1 for Windows
Installation Guide and Release Notes

Performance improvements

FFTS
Enhanced performance for transforms which are multiple of 8 of 13

Optimized 1d complex cluster FFTs for non-power-of-2 vector lengths


0 Kudos
John_S_18
Beginner
968 Views
@Frank_M -- Thanks much. With a little sleuthing, I tracked this to the Release Notes associated with MKL Release 10.2, Update 4.

-Ozzer
0 Kudos
Dmitry_B_Intel
Employee
968 Views
Hi,

There are some considerations about selecting optimized dimensions and data layout for multidimensional transforms recently published in MKL Knowledge Base article FFT length and layout advisor. This may not be exactly what you are looking for, but it may help.

Thanks
Dima
0 Kudos
Reply