FFT Optimized Radices

missing__zlw · ‎11-30-2010

I am using MKL DFTI to do FFT on 1D and 2D.
I would like to pad my array/Matrix to get best FFT performance.
There is some guideline in the manual saying the dimension should not be multiple of 2048 (for 2D FFT) and aligned.
I can align my 1D and 2D arrays, but I wonder how do I really get the optmized size. If I would like to generate an optmized size table, how it will look like?

In FFTW, they usually use multiple of small prime numbers. Does this apply to MKL?

Please provide some guideline on how to generate a table of optimized dimension number. This apply to both 1D and 2D.

Also for 2D, the row and col will follow the same rule, right?

To summarize, if I have 1D array, say 457 numbers, and I run FFT on it. What is the optimized dimension I should pad my array to?

for 2D, if it is 457x671, what is the optmized dimension?

Thank you.

barragan_villanueva_ · ‎12-01-2010

Hi,

For the best performance please provide 16-byte alignment of data.

As to optimized radices: MKL User Guide states (seeTips and Techniques to Improve Performance)

FFT Optimized Radices

You can improve the performance of the Intel Math Kernel Library (Intel MKL) FFT if the length of your data vector permits factorization into powers of optimized radices.

In Intel MKL, the optimized radices are 2, 3, 5, 7, and 11.

missing__zlw · ‎12-01-2010

Thanks for the info.
I also read in the MKL manual which says:

leading dimension values (n*element_size) of two-dimensional arrays are divisible by 16

for two-dimensional arrays, leading dimension values divisible by 2048 are avoided

For the C-style FFT, the distance L between arrays that represent real and imaginary

parts is not divisible by 64. The best case is when L=k*64 + 16

Leading dimension values, in bytes (n*element_size), of two-dimensional arrays are

not power of two.

could you give some explanation here on leading dimension? If I have array like Real_1, Imag_1, Real_2, Imag_2, ... is this an optimized FFT data?

Frank_M · ‎12-05-2010

[fortran]FUNCTION FGET_BUFFER_SIZE(datalen)

    IMPLICIT NONE
   
    INTEGER :: FGET_BUFFER_SIZE
    INTEGER, INTENT(IN) :: datalen
    REAL(8) :: ln2, ln3, ln5, ln7, ln11, ln13
    REAL(8) :: lndata   

    ln2 = DLOG(2.0_8)
    ln3 = DLOG(3.0_8)
    ln5 = DLOG(5.0_8)
    ln7 = DLOG(7.0_8)
    ln11 = DLOG(11.0_8)
    ln13 = DLOG(13.0_8)
    lndata = DLOG(DBLE(datalen))
    
    IF (MOD(datalen,2) >= 1) THEN 
        FGET_BUFFER_SIZE = ODD_BUFFER()
    ELSE
        FGET_BUFFER_SIZE = EVEN_BUFFER()
    END IF
    
CONTAINS

FUNCTION ODD_BUFFER()
    INTEGER :: ODD_BUFFER
    INTEGER :: buffsize, buffest
    REAL(8) :: cnt3 ,cnt5, cnt7, cnt11, cnt13, N
    REAL(8) :: tmp3, tmp5, tmp7, tmp11, tmp13
    
    buffsize = HUGE(buffsize)
    N = DLOG(DBLE(buffsize))
    cnt3 = 0.0_8
    cnt5 = 0.0_8
    cnt7 = 0.0_8
    cnt11 = 0.0_8 
    cnt13 = 0.0_8
    DO WHILE(cnt13*ln13Here is how I find the optimal buffer size for MKL fft. The number returned is your data length + symetric padding. The code does some rather unusual things on the surface. It starts its search on the small end of the scale. This is because there is no "known" way to determine if you have found the best solution except to check all the values between your best estimate and thelength of your data. I did some testing on the original (before MKL added 13 as an optimized prime) and found it took on average less then 100 iterations of the inner loop to find an odd buffer size and about 300 inner loop runs to find an even buffer size. Most likely there are faster methods or improvements that could be made to this code. If you find them please post.

John_S_18 · ‎12-06-2010

Quoting Frank_M

- collapse source view plain copy to clipboard print ?
FUNCTIONFGET_BUFFER_SIZE(datalen)
IMPLICITNONE
INTEGER::FGET_BUFFER_SIZE
INTEGER,INTENT(IN)::datalen
REAL(8)::ln2,ln3,ln5,ln7,ln11,ln13
REAL(8)::lndata
ln2=DLOG(2.0_8)
ln3=DLOG(3.0_8)
ln5=DLOG(5.0_8)
ln7=DLOG(7.0_8)
ln11=DLOG(11.0_8)
ln13=DLOG(13.0_8)
lndata=DLOG(DBLE(datalen))
IF(MOD(datalen,2)>=1)THEN
FGET_BUFFER_SIZE=ODD_BUFFER()
ELSE
FGET_BUFFER_SIZE=EVEN_BUFFER()
ENDIF
CONTAINS
FUNCTIONODD_BUFFER()
INTEGER::ODD_BUFFER
INTEGER::buffsize,buffest
REAL(8)::cnt3,cnt5,cnt7,cnt11,cnt13,N
REAL(8)::tmp3,tmp5,tmp7,tmp11,tmp13
buffsize=HUGE(buffsize)
N=DLOG(DBLE(buffsize))
cnt3=0.0_8
cnt5=0.0_8
cnt7=0.0_8
cnt11=0.0_8
cnt13=0.0_8
DOWHILE(cnt13*ln13
tmp13=cnt13*ln13
DOWHILE(cnt11*ln11+tmp13
tmp11=cnt11*ln11+tmp13
DOWHILE(cnt7*ln7+tmp11
tmp7=cnt7*ln7+tmp11
DOWHILE(cnt5*ln5+tmp7
tmp5=cnt5*ln5+tmp7
cnt3=DNINT((lndata-tmp5)/ln3+0.45_8)
DOWHILE(cnt3*ln3+tmp5
tmp3=cnt3*ln3+tmp5
buffest=NINT(DEXP(tmp3))
IF(buffest.GE.datalen)THEN
N=tmp3
buffsize=buffest
IF(buffsize==datalen)GOTO200
ENDIF
cnt3=cnt3+1.0_8
ENDDO
cnt3=0.0_8
cnt5=cnt5+1.0_8
ENDDO
cnt5=0.0_8
cnt7=cnt7+1.0_8
ENDDO
cnt7=0.0_8
cnt11=cnt11+1.0_8
ENDDO
cnt11=0.0_8
cnt13=cnt13+1.0_8
ENDDO
200CONTINUE
ODD_BUFFER=buffsize
ENDFUNCTIONODD_BUFFER
FUNCTIONEVEN_BUFFER()
INTEGER::EVEN_BUFFER
INTEGER::buffsize,buffest
REAL(8)::cnt2,cnt3,cnt5,cnt7,cnt11,cnt13,N
REAL(8)::tmp2,tmp3,tmp5,tmp7,tmp11,tmp13
buffsize=HUGE(buffsize)
N=DLOG(DBLE(buffsize))
cnt3=0.0_8
cnt5=0.0_8
cnt7=0.0_8
cnt11=0.0_8
cnt13=0.0_8
DOWHILE(cnt13*ln13
tmp13=cnt13*ln13
DOWHILE(ln2+cnt11*ln11+tmp13
tmp11=ln2+cnt11*ln11+tmp13
DOWHILE(cnt7*ln7+tmp11
tmp7=cnt7*ln7+tmp11
DOWHILE(cnt5*ln5+tmp7
tmp5=cnt5*ln5+tmp7
DOWHILE(cnt3*ln3+tmp5
tmp3=cnt3*ln3+tmp5-ln2
cnt2=DNINT((lndata-tmp3)/ln2+0.45_8)
If(cnt2<1.0_8)cnt2=1.0_8
DOWHILE(cnt2*ln2+tmp3
tmp2=cnt2*ln2+tmp3
buffest=NINT(DEXP(tmp2))
IF(buffest.GE.datalen)THEN
N=tmp2
buffsize=buffest
IF(buffsize==datalen)GOTO300
ENDIF
cnt2=cnt2+1.0_8
ENDDO
cnt2=1.0_8
cnt3=cnt3+1.0_8
ENDDO
cnt3=0.0_8
cnt5=cnt5+1.0_8
ENDDO
cnt5=0.0_8
cnt7=cnt7+1.0_8
ENDDO
cnt7=0.0_8
cnt11=cnt11+1.0_8
ENDDO
cnt11=0.0_8
cnt13=cnt13+1.0_8
ENDDO
300CONTINUE
EVEN_BUFFER=buffsize
ENDFUNCTIONEVEN_BUFFER
ENDFUNCTIONFGET_BUFFER_SIZE

[fortran]FUNCTION FGET_BUFFER_SIZE(datalen)

    IMPLICIT NONE
   
    INTEGER :: FGET_BUFFER_SIZE
    INTEGER, INTENT(IN) :: datalen
    REAL(8) :: ln2, ln3, ln5, ln7, ln11, ln13
    REAL(8) :: lndata   

    ln2 = DLOG(2.0_8)
    ln3 = DLOG(3.0_8)
    ln5 = DLOG(5.0_8)
    ln7 = DLOG(7.0_8)
    ln11 = DLOG(11.0_8)
    ln13 = DLOG(13.0_8)
    lndata = DLOG(DBLE(datalen))
    
    IF (MOD(datalen,2) >= 1) THEN 
        FGET_BUFFER_SIZE = ODD_BUFFER()
    ELSE
        FGET_BUFFER_SIZE = EVEN_BUFFER()
    END IF
    
CONTAINS

FUNCTION ODD_BUFFER()
    INTEGER :: ODD_BUFFER
    INTEGER :: buffsize, buffest
    REAL(8) :: cnt3 ,cnt5, cnt7, cnt11, cnt13, N
    REAL(8) :: tmp3, tmp5, tmp7, tmp11, tmp13
    
    buffsize = HUGE(buffsize)
    N = DLOG(DBLE(buffsize))
    cnt3 = 0.0_8
    cnt5 = 0.0_8
    cnt7 = 0.0_8
    cnt11 = 0.0_8 
    cnt13 = 0.0_8
    DO WHILE(cnt13*ln13Here is how I find the optimal buffer size for MKL fft. The number returned is your data length + symetric padding. The code does some rather unusual things on the surface. It starts its search on the small end of the scale. This is because there is no "known" way to determine if you have found the best solution except to check all the values between your best estimate and thelength of your data. I did some testing on the original (before MKL added 13 as an optimized prime) and found it took on average less then 100 iterations of the inner loop to find an odd buffer size and about 300 inner loop runs to find an even buffer size. Most likely there are faster methods or improvements that could be made to this code. If you find them please post.

@Frank_M: can you clarify why you used a radix value of 13, when the MKL recommendation has a maximum radix of 11? You refer to Intel expanding it to 13; in which version of MKL? I can't seem to find documentation for this change.

Thanks much,
Ozzer

Frank_M · ‎12-06-2010

from page 22 of Document 321417-002US
dated 12 August 2010

Intel Visual Fortran Compiler Professional Edition 11.1 for Windows
Installation Guide and Release Notes

Performance improvements

FFTS
Enhanced performance for transforms which are multiple of 8 of 13

Optimized 1d complex cluster FFTs for non-power-of-2 vector lengths

John_S_18 · ‎12-07-2010

@Frank_M -- Thanks much. With a little sleuthing, I tracked this to the Release Notes associated with MKL Release 10.2, Update 4.

-Ozzer

Dmitry_B_Intel · ‎12-14-2010

Hi,

There are some considerations about selecting optimized dimensions and data layout for multidimensional transforms recently published in MKL Knowledge Base article FFT length and layout advisor. This may not be exactly what you are looking for, but it may help.

Thanks
Dima

How to find optmized FFT dimension size

FFT Optimized Radices