1D DFT real to complex with non unit stride

mguy44 · ‎02-26-2010

Hello,

in order to parallelize completely a F90 application, I need to compute 2D FFT with 1D transforms (they will be distributed among OpenMP threads). These are real to complex transformations and it doesn't seem to work (values are wrong). First, I wrote a sample program before implementing the all thing in the real application. To write it, I looked in the MKL's documentation, in particular, the C-23 example (reference manual, p 3268, version -29 08/2008).

The computation in the first dimension works well, the problem appears when the transformation takes place in the second dimension : so values are wrong, and I do not understand why. I take trivial values for the entry signal (U). Here is the small example :

program dft1D
use mkl_dfti
implicit none
integer, parameter :: nx = 8, ny = 16
real(8), dimension(0:nx-1,0:ny-1) :: U
real(8), dimension(0:nx-1,0:ny-1) :: V
complex(8), dimension(0:nx-1,0:ny/2) :: F
!
real(8) :: pi, usy, xk, errg
integer :: i, k
!
TYPE(DFTI_DESCRIPTOR), POINTER :: desc
INTEGER :: precision
INTEGER :: domain, status
INTEGER :: dim, length
INTEGER, DIMENSION(2) :: input_strides, output_strides
!
pi = acos (-1.0_8)
usy = 1.0_8 / real(ny,8)

DO k = 0, ny-1
xk = 2.0_8 * pi * real(k,8) * usy
U(0,k) = 1.0_8
U(1,k) = cos (xk)
U(2,k) = 2.0_8
U(3,k) = sin (xk)
U(4,k) = 3.0_8
U(5,k) = cos (xk+xk)
U(6,k) = 4.0_8
U(7,k) = sin (xk+xk)
DO i = 0, 7
V(i,k) = 0.0_8
END DO
END DO
!
DO k = 0, ny/2
do i = 0, nx-1
F(i,k) = CMPLX (0.0_8, 0.0_8, 8)
end do
end do
!
precision = DFTI_DOUBLE
domain = DFTI_REAL
dim = 1
length = ny
status = DftiCreateDescriptor (desc, precision, domain, dim, length)
Status = DftiSetValue (desc, DFTI_PLACEMENT, DFTI_NOT_INPLACE)
Status = DftiSetValue (desc, DFTI_FORWARD_SCALE, usy)
Status = DftiSetValue (desc, DFTI_BACKWARD_SCALE, 1.0_8)
Status = DftiSetValue (desc, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_REAL)
input_strides = (/0, nx/)
Status = DftiSetValue (desc, DFTI_INPUT_STRIDES, input_strides)
output_strides = (/0, nx/)
Status = DftiSetValue (desc, DFTI_OUTPUT_STRIDES, output_strides)
status = DftiCommitDescriptor (desc)
!
DO i = 0, nx-1
status = dfti_compute_forward_dz (desc, U(i,0), F(i,0) )
end do
DO i = 0, nx-1
status = dfti_compute_backward_zd (desc, F(i,0), V(i,0) )
END DO
!
DO i = 0, nx - 1
errg = 0.0_8
DO k = 0,ny - 1
errg = max(errg, abs( U(i,k) - V(i,k) ) )
END DO
write (6,'(A,I3,A,E11.4)') 'errg (', i, ') = ', errg
write (6,*)
END DO

status = DftiFreeDescriptor (desc)

stop

end program dft1D

I compile the code with :

ifort -check all -warn all -O0 -g -traceback /opt/intel/Compiler/11.1/064/mkl/include/mkl_dfti.f90 small.f90 -o a.out -L /opt/intel/Compiler/11.1/064/mkl/lib/em64t/ -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lm

The output is :

errg ( 0) = 0.0000E+00
errg ( 1) = 0.1707E+01
errg ( 2) = 0.0000E+00
errg ( 3) = 0.1848E+01
errg ( 4) = 0.0000E+00
errg ( 5) = 0.2220E-15
errg ( 6) = 0.0000E+00
errg ( 7) = 0.2220E-15

Thank you for your comments.

Regards,

barragan_villanueva_ · ‎02-26-2010

Hi,

Could you take a look at MKL examples with 2D complex-to-real transforms?

They arelocated in the directory $MKLROOT/examples/dftc/source with names real_2d_cce*

Looks like your output strides are wrong.

mguy44 · ‎02-26-2010

Hi Victor,

thank you for your help.

For parallelization purpose, I use 1D transforms, so my output stride array is a vector with 2 entries as mentioned in the strides section on page 2828.

As you told me, I look at examples about 2D complex -to-real transforms ; I see examples C-24 and C-24-a (just to be sure, are these the ones you're talking about ?). As they are 2D transforms the stride arrays have 3 entries with corresponding values in a 2D storage and for a 2D transform, that's ok for me, but as it is not my case, I'm sorry, I do see where and how it can help me. Maybe I do not find the right examples ?

barragan_villanueva_ · ‎02-26-2010

Hi,

I mean examples inside of MKLwhich are arelocated in the directory $MKLROOT/examples/dftc/source with names real_*d_cce*

barragan_villanueva_ · ‎02-26-2010

As to 1D real-to-complex transform thereare used packed formats: CCS, PERM, PACK.

CCS, by default. But it requires N+2 elements.

barragan_villanueva_ · ‎02-26-2010

FORTRAN 90 examples are in $MKLROOT/examples/dftf

mguy44 · ‎02-26-2010

thank you for these informations.

I take a particular look at the "Real-to-complex and complex-to-real multiple rows transform not inplace for double precision data which are allocated in two-dimension array" example, e.g. real_1d_cce_double_ex4.f90

I notice two differences with my example :

- transforms are computed by groups

- 2D arrays are made EQUIVALENT to 1D arrays before calling the transform routine with the last ones.

In my case, I will be interested by the first one in the OpenMP application for better performance.

As I give directly the right entry point to the routine, U(i,0) and F(i,0), I thing I do not need the second one, do I ? As this is linked to the stride array, do you think I should use the equivalent form ?

Guy.

barragan_villanueva_ · ‎02-26-2010

< In my case, I will be interested by the first one in the OpenMP application for better performance.

< As I give directly the right entry point to the routine, U(i,0) and F(i,0), I thing I do not need the second

< one, do I ? As this is linked to the stride array, do you think I should use the equivalent form ?

Hi Guy,

Youmay implement OpenMP paralelization of 2D transforms but MKL can paralelize 2D on thread layer.

It would be interesting to compare performance results ofsuch cases.

mguy44 · ‎03-01-2010

This program is a 2D simulation.

So computing loops are parallelized with OpenMP and when 2D DFT occur, I want to write them as loops of 1D transforms and distribue them among the threads. I do not want to have openmp parallel regions on one side and MKL 2D parallel DFT on the other. In my opinion, changing parallel context lots of time inside the code is not a good point to achieve good parallel performances.

barragan_villanueva_ · ‎03-09-2010

< As I give directly the right entry point to the routine, U(i,0) and F(i,0), I thing I do not need the second

< one, do I ? As this is linked to the stride array, do you think I should use the equivalent form ?

Hi Guy,

Using equivalenceto pass arguments just guaranteesthat FORTRAN compiler doesn't do copy of arrays and DFTI strides will work correctly. But it's OK to pass arrays(see below)

Dima Baksheev and I corrected your testcase as follows (also see comments there):

program dft1D
use mkl_dfti
implicit none
integer, parameter :: nx = 8, ny = 16
real(8), dimension(0:nx-1,0:ny-1) :: U
real(8), dimension(0:nx-1,0:ny-1) :: V
complex(8), dimension(0:nx-1,0:ny/2) :: F
!
real(8) :: pi, usy, xk, errg
integer :: i, k
!
TYPE(DFTI_DESCRIPTOR), POINTER :: desc
INTEGER :: precision
INTEGER :: domain, status
INTEGER :: dim, length
INTEGER, DIMENSION(2) :: input_strides, output_strides
!
pi = acos (-1.0_8)
usy = 1.0_8 / real(ny,8)

DO k = 0, ny-1
xk = 2.0_8 * pi * real(k,8) * usy
U(0,k) = 1.0_8
U(1,k) = cos (xk)
U(2,k) = 2.0_8
U(3,k) = sin (xk)
U(4,k) = 3.0_8
U(5,k) = cos (xk+xk)
U(6,k) = 4.0_8
U(7,k) = sin (xk+xk)
DO i = 0, 7
V(i,k) = 0.0_8
END DO
END DO
!
DO k = 0, ny/2
do i = 0, nx-1
F(i,k) = CMPLX (0.0_8, 0.0_8, 8)
end do
end do
!
precision = DFTI_DOUBLE
domain = DFTI_REAL
dim = 1
length = ny

status = DftiCreateDescriptor (desc, precision, domain, dim, length)
Status = DftiSetValue (desc, DFTI_PLACEMENT, DFTI_NOT_INPLACE)
Status = DftiSetValue (desc, DFTI_FORWARD_SCALE, usy)
Status = DftiSetValue (desc, DFTI_BACKWARD_SCALE, 1.0_8)
Status = DftiSetValue (desc, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX) ! should be DFTI_COMPLEX_COMPLEX not DFTI_COMPLEX_REAL
input_strides = (/0, nx/)
Status = DftiSetValue (desc, DFTI_INPUT_STRIDES, input_strides)
output_strides = (/0, nx/)
Status = DftiSetValue (desc, DFTI_OUTPUT_STRIDES, output_strides)
status = DftiCommitDescriptor (desc)

DO i = 0, nx - 1
!status = DftiComputeForward (desc, U(i,0:), F(i:,0) ) ! incorrect
!!status = DftiComputeForward (desc, U(i,0), F(i,0) ) ! incorrect because DFTI needs array arguments
status = DftiComputeForward (desc, U(i:,0), F(i:,0) ) ! correct usage
!!status = dfti_compute_forward_dz (desc, U(i,0), F(i,0) ) ! correct too but uses internal DFTI functions
end do

DO i = 0, nx - 1
!status = DftiComputeBackward (desc, U(i,0:), F(i:,0) ) ! incorrect
!!status = DftiComputeBackward (desc, F(i,0), V(i,0) ) ! incorrect because DFTI needs array arguments
status = DftiComputeBackward (desc, F(i:,0), V(i:,0) ) ! correct usage
!!status = dfti_compute_backward_zd (desc, F(i,0), V(i,0) ) ! correct too but uses internal DFTI functions
END DO

!
DO i = 0, nx - 1
errg = 0.0_8
DO k = 0, ny - 1
errg = max(errg, abs( U(i,k) - V(i,k) ) )
END DO
write (6,'(A,I3,A,E11.4)') 'errg (', i, ') = ', errg
write (6,*)
END DO

status = DftiFreeDescriptor (desc)

stop

end program dft1D

and now expected results:

errg ( 0) = 0.0000E+00

errg ( 1) = 0.1110E-15

errg ( 2) = 0.0000E+00

errg ( 3) = 0.1110E-15

errg ( 4) = 0.0000E+00

errg ( 5) = 0.2220E-15

errg ( 6) = 0.0000E+00

errg ( 7) = 0.2220E-15

mguy44 · ‎03-09-2010

Hi Victor (and Dima)

Thank you very much ; I do really appreciate your help, from you and Dima.

Your corrections work well but maybe I do not express correctly my needs, so I'm going to explain them next (and I apologize for this if it's redundant and a bit too long). For example, the position of the ":" in U(i:,0) is not what I think it should be, but I can misunderstand your answer !

I have a real array nx rows, ny columns, called U, stored in fortran mode. Inside OpenMP parallel regions, I have to do a DFT of this 2D array. I am interested by the transform of the rows (for the columns; I understand how to do).

So I write an openmp parallel loop of 1D DFT (so nx real to complex 1D transforms), it gives something like that :

!$OMP DO

do i = 0,nx-1

transform U(i,0:ny-1) --> F(i,0:ny/2)

end do

!$OMP END DO

That means the transform of the vector ( U(i,0), U(i,1),U(i,2),...,U(i,ny-1) ) into ( F(i,0), F(i,1), F(i,2), ..., F(i,ny/2) ) for each i in 0 to nx-1.

U is a real array, F is an array of complex values. (I did not catch your point about DFTI_CONJUGATE_EVEN_STORAGE, I did not see any definition or explanation in the Reference manual). What do DFTI_COMPLEX_COMPLEX and DFTI_COMPLEX_REAL mean and what are their differences ? I took one of them (U is real ?).

As U is real, I only compute half of the Fourier coefficients which are stored in F from index 0 to ny/2.

When you write :

status = DftiComputeForward (desc, U(i:,0), F(i:,0) ) ! correct usage

I think (but I can be wrong) that it is not what I try to do in the previous loop.

I think It should be something like

status = DftiComputeForward (desc, U(i,0:), F(i,0:) )

And in that case, input_strides and output_strides should be both (/0, 1/) because I construct by myself the vector of values to be transformed. When I run this, the runtime library of the compiler tells me (and I do perfectly understand why)

forrtl: warning (402): fort: (1): In call to DFTI_COMPUTE_FORWARD_DZ, an array temporary was created for argument #2

And this is exactly what I do not want to do ! I do not want to create a temporary array, I do not want the system spend time to allocate and deallocate it, copy data to and from it. I want to tell mkl that there is a non unit stride between two consecutive values of the row to be transformed in a 2D array ...

(sorry for my bad english !)

barragan_villanueva_ · ‎03-09-2010

Hi,

Here are some brief comments:

1) About the position of the ":" in two dimensional array

DFTI FORTRAN compute functions require arrays as input/output arguments. Therefore, it was incorrect to use:

status = DftiComputeBackward (desc, F(i,0), V(i,0) ) ! incorrect because DFTI needs array arguments

because ofcompiler error #6284: There is no matching specific function for this generic function reference. [DFTICOMPUTEFORWARD]

But in order to prevent copying of arrays by compiler (or using temporary arrays) in your testcase, the colon was added for the first dimentionfor passing correct referenceof the first element (because added colon just related to the shape of the array, not to its reference).
Also, using equivalence is OK as it is in MKL FORTRAN examples.

Nevertheless, it would OK to use (but maybe not portable after upgrading to newer MKL version):

status = dfti_compute_backward_zd (desc, F(i,0), V(i,0) ) ! correct too but uses internal DFTI functions

You know, DFTI does not check size of passed array and outputs all elements (where problemsize was set in DftiCreateDescriptor) doing namely what you want:

transform U(i,0:ny-1) --> F(i,0:ny/2)

2) About openmp parallel loop of 1D DFT (so nx real to complex 1D transforms)

If so, you need to set DFTI_NUMBER_OF_USER_THREADS. E.g.:

Status = DftiSetValue (desc, DFTI_NUMBER_OF_USER_THREADS, 8)

IDZ_A_Intel · ‎03-10-2010

for 1., it's a good "trick" and when I read it, I didn't understand how tricky it was. Now,it's ok for me. I try it and it works for "constant" dimensionned arrays like :

integer, parameter :: nx = 8, ny = 16

real(8), dimension(0:nx-1,0:ny-1) :: U

but If you use dynamically allocated arrays, like :

real(8), dimension(:,:), allocatable :: U
...
read(5,*) nx,ny

...

allocate ( U(0:nx-1,0:ny-1) )

First, you get the following message at runtime :

forrtl: warning (402): fort: (1): In call to DFTI_COMPUTE_BACKWARD_ZD, an array temporary was created for argument #2

Next, there are Nan values in the output array.

One way to recover from this situation is to slightly modify the calling sequence into :

status = DftiComputeForward (desc, U(i:nx-1,0), F(i:nx-1,0) )

I mean, changing "(i:,0)" in "(i:nx-1,0)"

barragan_villanueva_ · ‎03-11-2010

Thanks Guy,

I can see that

status = DftiComputeForward (desc, U(i:i,0), F(i:i,0) ) ! correct usage

and

status = DftiComputeBackward (desc, F(i:i,0), V(i:i,0) ) ! correct usage

work OK for both static and dynamic arrays.

So, this is just to pass referenceto the first element (what is similar to C-pointer) but like array needed for DFTI FORTRAN interface.

IDZ_A_Intel · ‎03-12-2010

OK, now I have written my 2D FFT as loops of 1D FFT on the rows and on the columns. It seems to work.

In order to check it, I'd like to call a 2D FFT and compare the values after forward and backward transfoms.

So I use :

precision = DFTI_DOUBLE
domain = DFTI_REAL
dim = 2
length2 = (/nx, ny/)

status = DftiCreateDescriptor (descUH, precision, domain, dim, length2)
Status = DftiSetValue (descUH, DFTI_PLACEMENT, DFTI_NOT_INPLACE)
Status = DftiSetValue (descUH, DFTI_FORWARD_SCALE, usx*usy)
Status = DftiSetValue (descUH, DFTI_BACKWARD_SCALE, 1.0_8)
Status = DftiSetValue (descUH, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX)
input_strides2 = (/0, 1, nx/)
Status = DftiSetValue (descUH, DFTI_INPUT_STRIDES, input_strides)
output_strides2 = (/0, 1, pnx+1/)
Status = DftiSetValue (descUH, DFTI_OUTPUT_STRIDES, output_strides)

status = DftiCommitDescriptor (descUH)
status = DftiComputeForward (descUH, U (0:0,0), H (0:0,0) )

with the allocation of the real array U, and complex array H :

allocate ( U(0:nx-1,0:ny-1), H(0:nx/2,0:ny-1) )

1. If I do not explicitly put "(0:0,0)", the compilation fails :

error #6284: There is no matching specific function for this generic function reference. [DFTICOMPUTEFORWARD]
It's strange : U and H are arrays ?

2. After this, the execution fails, the program hangs (H remains identically zero, and the status is zero too), I have to kill it. Nevertheless, I follow the example real_2d_ccs_double_ex2.f90 which seems the more suitable one.

3. One more point, I was surprised to read in this example that one has to change the DFTI_INPUT_STRIDES and DFTI_OUTPUT_STRIDES for the backward transform (exchange in fact) because the shapes of the arrays are different.

barragan_villanueva_ · ‎03-15-2010

Hi,

As to your question #1 in previous post:

DFTI FORTRAN interface for compute functions expects arrayswith dimension 1 so it must be H(0:0,0)

As to #2/#3

Please look atreal_2d_cce_double_ex*.f90 whereinput and output strides are to be exchenged because of different shapes for forward and backward transforms.

mguy44 · ‎03-15-2010

I found my bug, it's working well now.

Thank you for your help.

Now, I will work on the "real" application, try to put mkl's fft inside.

Guy.