Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- 1D DFT real to complex with non unit stride

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

mguy44

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-26-2010
02:47 AM

98 Views

1D DFT real to complex with non unit stride

Hello,

in order to parallelize completely a F90 application, I need to compute 2D FFT with 1D transforms (they will be distributed among OpenMP threads). These are real to complex transformations and it doesn't seem to work (values are wrong). First, I wrote a sample program before implementing the all thing in the real application. To write it, I looked in the MKL's documentation, in particular, the C-23 example (reference manual, p 3268, version -29 08/2008).

The computation in the first dimension works well, the problem appears when the transformation takes place in the second dimension : so values are wrong, and I do not understand why. I take trivial values for the entry signal (U). Here is the small example :

program dft1D

use mkl_dfti

implicit none

integer, parameter :: nx = 8, ny = 16

real(8), dimension(0:nx-1,0:ny-1) :: U

real(8), dimension(0:nx-1,0:ny-1) :: V

complex(8), dimension(0:nx-1,0:ny/2) :: F

!

real(8) :: pi, usy, xk, errg

integer :: i, k

!

TYPE(DFTI_DESCRIPTOR), POINTER :: desc

INTEGER :: precision

INTEGER :: domain, status

INTEGER :: dim, length

INTEGER, DIMENSION(2) :: input_strides, output_strides

!

pi = acos (-1.0_8)

usy = 1.0_8 / real(ny,8)

DO k = 0, ny-1

xk = 2.0_8 * pi * real(k,8) * usy

U(0,k) = 1.0_8

U(1,k) = cos (xk)

U(2,k) = 2.0_8

U(3,k) = sin (xk)

U(4,k) = 3.0_8

U(5,k) = cos (xk+xk)

U(6,k) = 4.0_8

U(7,k) = sin (xk+xk)

DO i = 0, 7

V(i,k) = 0.0_8

END DO

END DO

!

DO k = 0, ny/2

do i = 0, nx-1

F(i,k) = CMPLX (0.0_8, 0.0_8, 8)

end do

end do

!

precision = DFTI_DOUBLE

domain = DFTI_REAL

dim = 1

length = ny

status = DftiCreateDescriptor (desc, precision, domain, dim, length)

Status = DftiSetValue (desc, DFTI_PLACEMENT, DFTI_NOT_INPLACE)

Status = DftiSetValue (desc, DFTI_FORWARD_SCALE, usy)

Status = DftiSetValue (desc, DFTI_BACKWARD_SCALE, 1.0_8)

Status = DftiSetValue (desc, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_REAL)

input_strides = (/0, nx/)

Status = DftiSetValue (desc, DFTI_INPUT_STRIDES, input_strides)

output_strides = (/0, nx/)

Status = DftiSetValue (desc, DFTI_OUTPUT_STRIDES, output_strides)

status = DftiCommitDescriptor (desc)

!

DO i = 0, nx-1

status = dfti_compute_forward_dz (desc, U(i,0), F(i,0) )

end do

DO i = 0, nx-1

status = dfti_compute_backward_zd (desc, F(i,0), V(i,0) )

END DO

!

DO i = 0, nx - 1

errg = 0.0_8

DO k = 0,ny - 1

errg = max(errg, abs( U(i,k) - V(i,k) ) )

END DO

write (6,'(A,I3,A,E11.4)') 'errg (', i, ') = ', errg

write (6,*)

END DO

status = DftiFreeDescriptor (desc)

stop

end program dft1D

I compile the code with :

ifort -check all -warn all -O0 -g -traceback /opt/intel/Compiler/11.1/064/mkl/include/mkl_dfti.f90 small.f90 -o a.out -L /opt/intel/Compiler/11.1/064/mkl/lib/em64t/ -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lm

The output is :

errg ( 0) = 0.0000E+00

errg ( 1) = 0.1707E+01

errg ( 2) = 0.0000E+00

errg ( 3) = 0.1848E+01

errg ( 4) = 0.0000E+00

errg ( 5) = 0.2220E-15

errg ( 6) = 0.0000E+00

errg ( 7) = 0.2220E-15

Thank you for your comments.

Regards,

Link Copied

16 Replies

barragan_villanueva_

Valued Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-26-2010
03:04 AM

98 Views

Hi,

Could you take a look at MKL examples with 2D complex-to-real transforms?

They arelocated in the directory **$MKLROOT/examples/dftc/source** with names **real_2d_cce***

Looks like your output strides are wrong.

mguy44

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-26-2010
03:58 AM

98 Views

Hi Victor,

thank you for your help.

For parallelization purpose, I use 1D transforms, so my output stride array is a vector with 2 entries as mentioned in the strides section on page 2828.

As you told me, I look at examples about 2D complex -to-real transforms ; I see examples C-24 and C-24-a (just to be sure, are these the ones you're talking about ?). As they are 2D transforms the stride arrays have 3 entries with corresponding values in a 2D storage and for a 2D transform, that's ok for me, but as it is not my case, I'm sorry, I do see where and how it can help me. Maybe I do not find the right examples ?

barragan_villanueva_

Valued Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-26-2010
04:26 AM

98 Views

Hi,

I mean examples inside of MKLwhich are arelocated in the directory **$MKLROOT/examples/dftc/source** with names **real_*d_cce***

barragan_villanueva_

Valued Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-26-2010
04:36 AM

98 Views

As to 1D real-to-complex transform thereare used packed formats: CCS, PERM, PACK.

CCS, by default. But it requires N+2 elements.

barragan_villanueva_

Valued Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-26-2010
04:43 AM

98 Views

FORTRAN 90 examples are in **$MKLROOT/examples/dftf**

mguy44

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-26-2010
06:16 AM

98 Views

thank you for these informations.

I take a particular look at the "Real-to-complex and complex-to-real multiple rows transform not inplace for double precision data which are allocated in two-dimension array" example, e.g. real_1d_cce_double_ex4.f90

I notice two differences with my example :

- transforms are computed by groups

- 2D arrays are made EQUIVALENT to 1D arrays before calling the transform routine with the last ones.

In my case, I will be interested by the first one in the OpenMP application for better performance.

As I give directly the right entry point to the routine, U(i,0) and F(i,0), I thing I do not need the second one, do I ? As this is linked to the stride array, do you think I should use the equivalent form ?

Guy.

barragan_villanueva_

Valued Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-26-2010
10:43 PM

98 Views

< In my case, I will be interested by the first one in the OpenMP application for better performance.

< As I give directly the right entry point to the routine, U(i,0) and F(i,0), I thing I do not need the second

< one, do I ? As this is linked to the stride array, do you think I should use the equivalent form ?

Hi Guy,

Youmay implement OpenMP paralelization of 2D transforms but MKL can paralelize 2D on thread layer.

It would be interesting to compare performance results ofsuch cases.

mguy44

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-01-2010
03:00 AM

98 Views

This program is a 2D simulation.

So computing loops are parallelized with OpenMP and when 2D DFT occur, I want to write them as loops of 1D transforms and distribue them among the threads. I do not want to have openmp parallel regions on one side and MKL 2D parallel DFT on the other. In my opinion, changing parallel context lots of time inside the code is not a good point to achieve good parallel performances.

barragan_villanueva_

Valued Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-09-2010
12:42 AM

98 Views

< As I give directly the right entry point to the routine, U(i,0) and F(i,0), I thing I do not need the second

< one, do I ? As this is linked to the stride array, do you think I should use the equivalent form ?

Hi Guy,

Using equivalenceto pass arguments just guaranteesthat FORTRAN compiler doesn't do copy of arrays and DFTI strides will work correctly. But it's OK to pass arrays(see below)

Dima Baksheev and I corrected your testcase as follows (also see comments there):

program dft1D

use mkl_dfti

implicit none

integer, parameter :: nx = 8, ny = 16

real(8), dimension(0:nx-1,0:ny-1) :: U

real(8), dimension(0:nx-1,0:ny-1) :: V

complex(8), dimension(0:nx-1,0:ny/2) :: F

!

real(8) :: pi, usy, xk, errg

integer :: i, k

!

TYPE(DFTI_DESCRIPTOR), POINTER :: desc

INTEGER :: precision

INTEGER :: domain, status

INTEGER :: dim, length

INTEGER, DIMENSION(2) :: input_strides, output_strides

!

pi = acos (-1.0_8)

usy = 1.0_8 / real(ny,8)

DO k = 0, ny-1

xk = 2.0_8 * pi * real(k,8) * usy

U(0,k) = 1.0_8

U(1,k) = cos (xk)

U(2,k) = 2.0_8

U(3,k) = sin (xk)

U(4,k) = 3.0_8

U(5,k) = cos (xk+xk)

U(6,k) = 4.0_8

U(7,k) = sin (xk+xk)

DO i = 0, 7

V(i,k) = 0.0_8

END DO

END DO

!

DO k = 0, ny/2

do i = 0, nx-1

F(i,k) = CMPLX (0.0_8, 0.0_8, 8)

end do

end do

!

precision = DFTI_DOUBLE

domain = DFTI_REAL

dim = 1

length = ny

status = DftiCreateDescriptor (desc, precision, domain, dim, length)

Status = DftiSetValue (desc, DFTI_PLACEMENT, DFTI_NOT_INPLACE)

Status = DftiSetValue (desc, DFTI_FORWARD_SCALE, usy)

Status = DftiSetValue (desc, DFTI_BACKWARD_SCALE, 1.0_8)

Status = DftiSetValue (desc, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX) ! should be DFTI_COMPLEX_COMPLEX not DFTI_COMPLEX_REAL

input_strides = (/0, nx/)

Status = DftiSetValue (desc, DFTI_INPUT_STRIDES, input_strides)

output_strides = (/0, nx/)

Status = DftiSetValue (desc, DFTI_OUTPUT_STRIDES, output_strides)

status = DftiCommitDescriptor (desc)

DO i = 0, nx - 1

!status = DftiComputeForward (desc, U(i,0:), F(i:,0) ) ! incorrect

!!status = DftiComputeForward (desc, U(i,0), F(i,0) ) ! incorrect because DFTI needs array arguments

status = DftiComputeForward (desc, U(i:,0), F(i:,0) ) ! correct usage

!!status = dfti_compute_forward_dz (desc, U(i,0), F(i,0) ) ! correct too but uses internal DFTI functions

end do

DO i = 0, nx - 1

!status = DftiComputeBackward (desc, U(i,0:), F(i:,0) ) ! incorrect

!!status = DftiComputeBackward (desc, F(i,0), V(i,0) ) ! incorrect because DFTI needs array arguments

status = DftiComputeBackward (desc, F(i:,0), V(i:,0) ) ! correct usage

!!status = dfti_compute_backward_zd (desc, F(i,0), V(i,0) ) ! correct too but uses internal DFTI functions

END DO

!

DO i = 0, nx - 1

errg = 0.0_8

DO k = 0, ny - 1

errg = max(errg, abs( U(i,k) - V(i,k) ) )

END DO

write (6,'(A,I3,A,E11.4)') 'errg (', i, ') = ', errg

write (6,*)

END DO

status = DftiFreeDescriptor (desc)

stop

end program dft1D

**and now expected results:**

errg ( 0) = 0.0000E+00

errg ( 1) = 0.1110E-15

errg ( 2) = 0.0000E+00

errg ( 3) = 0.1110E-15

errg ( 4) = 0.0000E+00

errg ( 5) = 0.2220E-15

errg ( 6) = 0.0000E+00

errg ( 7) = 0.2220E-15

mguy44

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-09-2010
06:46 AM

98 Views

Hi Victor (and Dima)

Thank you very much ; I do really appreciate your help, from you and Dima.

Your corrections work well but maybe I do not express correctly my needs, so I'm going to explain them next (and I apologize for this if it's redundant and a bit too long). For example, the position of the ":" in U(i:,0) is not what I think it should be, but I can misunderstand your answer !

I have a real array nx rows, ny columns, called U, stored in fortran mode. Inside OpenMP parallel regions, I have to do a DFT of this 2D array. I am interested by the transform of the rows (for the columns; I understand how to do).

So I write an openmp parallel loop of 1D DFT (so nx real to complex 1D transforms), it gives something like that :

!$OMP DO

do i = 0,nx-1

transform U(i,0:ny-1) --> F(i,0:ny/2)

end do

!$OMP END DO

That means the transform of the vector ( U(i,0), U(i,1),U(i,2),...,U(i,ny-1) ) into ( F(i,0), F(i,1), F(i,2), ..., F(i,ny/2) ) for each i in 0 to nx-1.

U is a real array, F is an array of complex values. (I did not catch your point about DFTI_CONJUGATE_EVEN_STORAGE, I did not see any definition or explanation in the Reference manual). What do DFTI_COMPLEX_COMPLEX and DFTI_COMPLEX_REAL mean and what are their differences ? I took one of them (U is real ?).

As U is real, I only compute half of the Fourier coefficients which are stored in F from index 0 to ny/2.

When you write :

status = DftiComputeForward (desc, U(i:,0), F(i:,0) ) ! correct usage

I think (but I can be wrong) that it is not what I try to do in the previous loop.

I think It should be something like

status = DftiComputeForward (desc, U(i,0:), F(i,0:) )

And in that case, input_strides and output_strides should be both (/0, 1/) because I construct by myself the vector of values to be transformed. When I run this, the runtime library of the compiler tells me (and I do perfectly understand why)

forrtl: warning (402): fort: (1): In call to DFTI_COMPUTE_FORWARD_DZ, an array temporary was created for argument #2

And this is exactly what I do not want to do ! I do not want to create a temporary array, I do not want the system spend time to allocate and deallocate it, copy data to and from it. I want to tell mkl that there is a non unit stride between two consecutive values of the row to be transformed in a 2D array ...

(sorry for my bad english !)

barragan_villanueva_

Valued Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-09-2010
11:55 PM

98 Views

Hi,

Here are some brief comments:

1) About the position of the ":" in two dimensional array

DFTI FORTRAN compute functions require arrays as input/output arguments. Therefore, it was incorrect to use:

status = DftiComputeBackward (desc, F(i,0), V(i,0) ) ! incorrect because DFTI needs array arguments

because ofcompiler error #6284: There is no matching specific function for this generic function reference. [DFTICOMPUTEFORWARD]

But in order to prevent copying of arrays by compiler (or using temporary arrays) in your testcase, the colon was added for the first dimentionfor passing correct referenceof the first element (because added colon just related to the shape of the array, not to its reference).

Also, using equivalence is OK as it is in MKL FORTRAN examples.

Nevertheless, it would OK to use (but maybe not portable after upgrading to newer MKL version):

status = dfti_compute_backward_zd (desc, F(i,0), V(i,0) ) ! correct too but uses internal DFTI functions

You know, DFTI does not check size of passed array and outputs all elements (where problemsize was set in DftiCreateDescriptor) doing namely what you want:

transform U(i,0:ny-1) --> F(i,0:ny/2)

2) About openmp parallel loop of 1D DFT (so nx real to complex 1D transforms)

If so, you need to set DFTI_NUMBER_OF_USER_THREADS. E.g.:

Status = DftiSetValue (desc, DFTI_NUMBER_OF_USER_THREADS, 8)

IDZ_A_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-10-2010
08:41 AM

98 Views

for 1., it's a good "trick" and when I read it, I didn't understand how tricky it was. Now,it's ok for me. I try it and it works for "constant" dimensionned arrays like :

integer, parameter :: nx = 8, ny = 16

real(8), dimension(0:nx-1,0:ny-1) :: U

but If you use dynamically allocated arrays, like :

real(8), dimension(:,:), allocatable :: U

...

read(5,*) nx,ny

...

allocate ( U(0:nx-1,0:ny-1) )

First, you get the following message at runtime :

forrtl: warning (402): fort: (1): In call to DFTI_COMPUTE_BACKWARD_ZD, an array temporary was created for argument #2

Next, there are Nan values in the output array.

One way to recover from this situation is to slightly modify the calling sequence into :

status = DftiComputeForward (desc, U(i:nx-1,0), F(i:nx-1,0) )

I mean, changing "(i:,0)" in "(i:nx-1,0)"

barragan_villanueva_

Valued Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-11-2010
12:19 AM

98 Views

Thanks Guy,

I can see that

status = DftiComputeForward (desc, U(i:i,0), F(i:i,0) ) ! correct usage

and

status = DftiComputeBackward (desc, F(i:i,0), V(i:i,0) ) ! correct usage

work OK for both static and dynamic arrays.

So, this is just to pass referenceto the first element (what is similar to C-pointer) but like array needed for DFTI FORTRAN interface.

IDZ_A_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-12-2010
08:51 AM

98 Views

OK, now I have written my 2D FFT as loops of 1D FFT on the rows and on the columns. It seems to work.

In order to check it, I'd like to call a 2D FFT and compare the values after forward and backward transfoms.

So I use :

precision = DFTI_DOUBLE

domain = DFTI_REAL

dim = 2

length2 = (/nx, ny/)

status = DftiCreateDescriptor (descUH, precision, domain, dim, length2)

Status = DftiSetValue (descUH, DFTI_PLACEMENT, DFTI_NOT_INPLACE)

Status = DftiSetValue (descUH, DFTI_FORWARD_SCALE, usx*usy)

Status = DftiSetValue (descUH, DFTI_BACKWARD_SCALE, 1.0_8)

Status = DftiSetValue (descUH, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX)

input_strides2 = (/0, 1, nx/)

Status = DftiSetValue (descUH, DFTI_INPUT_STRIDES, input_strides)

output_strides2 = (/0, 1, pnx+1/)

Status = DftiSetValue (descUH, DFTI_OUTPUT_STRIDES, output_strides)

status = DftiCommitDescriptor (descUH)

status = DftiComputeForward (descUH, U (0:0,0), H (0:0,0) )

with the allocation of the real array U, and complex array H :

allocate ( U(0:nx-1,0:ny-1), H(0:nx/2,0:ny-1) )

1. If I do not explicitly put "(0:0,0)", the compilation fails :

error #6284: There is no matching specific function for this generic function reference. [DFTICOMPUTEFORWARD]

It's strange : U and H are arrays ?

2. After this, the execution fails, the program hangs (H remains identically zero, and the status is zero too), I have to kill it. Nevertheless, I follow the example real_2d_ccs_double_ex2.f90 which seems the more suitable one.

3. One more point, I was surprised to read in this example that one has to change the DFTI_INPUT_STRIDES and DFTI_OUTPUT_STRIDES for the backward transform (exchange in fact) because the shapes of the arrays are different.

barragan_villanueva_

Valued Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-15-2010
12:28 AM

98 Views

Hi,

As to your question #1 in previous post:

DFTI FORTRAN interface for compute functions expects arrayswith dimension 1 so it must be H(0:0,0)

As to #2/#3

Please look atreal_2d_cce_double_ex*.f90 whereinput and output strides are to be exchenged because of different shapes for forward and backward transforms.

mguy44

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-15-2010
09:44 AM

98 Views

I found my bug, it's working well now.

Thank you for your help.

Now, I will work on the "real" application, try to put mkl's fft inside.

Guy.

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.