Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

parallel FFT function with OpenMP

Raemon
Beginner
1,144 Views

Hi,

I follow this article and try to use the method (2) to parallel DFTI funtion by OpenMP.

My main code is like this (fortran)

Equivalence(tmp,tmp1D)

CALL omp_set_num_threads(2)
CALL mkl_set_num_threads(1)

lengths(1)=n
lengths(2)=n
!$OMP PARALLEL
MYID=omp_get_thread_num()
Status = DftiCreateDescriptor(fftjob,DFTI_DOUBLE,DFTI_COMPLEX,2,lengths)
Status = DftiSetValue(fftjob,DFTI_NUMBER_OF_TRANSFORMS,2)
Status = DftiSetValue(fftjob,DFTI_INPUT_DISTANCE,(n**2)/2)
Status = DftiCommitDescriptor(fftjob)
Status = DftiComputeBackward(fftjob,tmp1D(((n**2)/2)*MYID+1:((n**2)/2)*(MYID+1)))
Status = DftiFreeDescriptor(fftjob)

!$OMP END PARALLEL

However, this error message pops while executing my program:

*** glibc detected *** ./ITPCARTINTPFFT2DII: double free or corruption (top): 0x00002b09d0006ef0 ***...

The examples in that article is written in C (I am not familiar with C)

Can someone tells me what's wrong?

0 Kudos
8 Replies
barragan_villanueva_
Valued Contributor I
1,144 Views

Hi,

I added some comments to used by you arcticle about FORTRAN examples.

Also that example in C is to be corrected.

In your example: memory was corrupted :(

I can see several problems there

1)Why DFTI_NUMBER_OF_TRANSFORMS equal 2 is used in every thread?

2) DFTI_INPUT_DISTANCE should be not in elemets of array but as real offset: not (n**2)/2

Original C-example is to compute 2d 50x100 in parallel as different subarrays in common 200x100 array.

But in your test you compute tmp1D(((n**2)/2)*MYID+1:((n**2)/2)*(MYID+1)) data in parallel.

What isthe idea of your test?

0 Kudos
Raemon
Beginner
1,144 Views

Hi, Victor:

1) Actually I am NOT SURE whether this setting is correct or not. All that I think is the setting should be after the "CreateDescriptor". In addition, I run this test on my notebook, which has a dual-core intel CPU so that I set the value "DFTI_NUMBER_OF_TRANSFORMS" equals to 2.

2) Well, my original code is threaded by MKL internal threading. It uses "Equivalence" to let an 2D array be regarded as an 1D array. (In my example, tmp is an 2D n by n array, tmp1D is an 1D n**2 array) This technique is commonly used (eg. /mkl/examples/dftf/source/complex_2d_double_ex1.f90)

and I think PERHAPS that technique can be worked here. Since here (In my example) the data size=n**2 and the number of cores=2, half of the original data would be calculated at each core ( Specify DFTI_INPUT_DISTANCE=(n**2)/2 )

In addition, the reason I write tmp1D(((n**2)/2)*MYID+1:((n**2)/2)*(MYID+1)) is:

I have two core: One is MYID=0 and another is MYID=1

I'd like to specify core 1(MYID=0) to compute tmp1D(1:n**2/2)

and specify core 2(MYID=1) to compute the another half:tmp1D(n**2/2+1:n**2)

I hope some fortran examples(FFT parallelization with OpenMP) will be included in the MKL in the future.

Best Regards,

Raemon

0 Kudos
Raemon
Beginner
1,144 Views

By the way, I think the example (Using Parallel Mode with Multiple Descriptors Initialized in a Parallel Region) in that article tells you to set DFTI_INPUT_DISTANCE = 5000 is due to that: In that example, each thread calculates real FFT for matrix (50*100=5000). Since it assums one has four threads, I think DFTI_NUMBER_OF_TRANSFORMS = 4 should be specified in that example instead of specifying DFTI_NUMBER_OF_TRANSFORMS = n (n is the number of cores)

Am I right?

Best Regards,

Raemon

0 Kudos
barragan_villanueva_
Valued Contributor I
1,144 Views

Raemon,

Asking you about the idea behind your test I wanted to know what computations should be done in n*n array. Please explain what work is to be done as a whole, what is to be done in every thread.

You know, DFTI_NUMBER_OF_TRANSFORMS > 1means that DftiCompute function will compute multiple DFTs and DFTI_INPUT_DISTANCE/DFTI_OUTPUT_DISTANCE say about how to calculate next input/output arrays of sucn series (please see MKL docs)

In your case 2D array you devided on two parts but every thread computes 2 series. As a result you corrupted the memory because DISTANCE is tobe offset for input/output arrays and NUMBER_OF_TRANSFORMS = 2

0 Kudos
Raemon
Beginner
1,144 Views

Hi, Victor:

Thanks for your correction. All I want to do is parallel FFT with OpenMP

and turn off the MKL internal threading. So that I don't need to

1) set DFTI_NUMBER_OF_TRANSFORMS = number of cores

2) set DFTI_INPUT_DISTANCE = input_distance

Since this way parallels FFT with MKL


However, I am confused in the provided C example:

[fortran]#include "mkl_dfti.h"

#include "omp.h"
void main () {
float _Complex x[200][100];

MKL_LONG len[2];

//...put input data into x 0<=j<=199, 0<=k<=99

len[0] = 50; len[1] = 100;
// each thread calculates real FFT for matrix (50*100)

#pragma omp parallel {

DFTI_DESCRIPTOR_HANDLE my_desc_handle;

MKL_LONG myStatus;

int myID = omp_get_thread_num ();

myStatus = DftiCreateDescriptor (&my_desc_handle, DFTI_SINGLE,

DFTI_COMPLEX, 2, len);

myStatus = DftiCommitDescriptor (my_desc_handle);

myStatus = DftiComputeForward (my_desc_handle, &x [myID * len[0]] [0] );
myStatus = DftiFreeDescriptor (&my_desc_handle);

} /* End OpenMP parallel region */

}[/fortran]
What does line 31 do? How to express this line in FORTRAN?

Regards,

Raemon

0 Kudos
Raemon
Beginner
1,144 Views

This is my current scheme:

[fxfortran]	Equivalence(tmp,tmp1D)			!# tmp(n,n), tmp1d(n**2)

	 CALL omp_set_num_threads(2) 
	 CALL mkl_set_num_threads(1) 
	lengths(1)=n/2 
	lengths(2)=n 

	!$OMP PARALLEL
	MYID=omp_get_thread_num()		!# MYID=0 for Core1, MYID=1 for Core2 
	write(*,*)((n**2)/2)*MYID+1,((n**2)/2)*(MYID+1) 
	Status=DftiCreateDescriptor(fftjob,DFTI_DOUBLE,DFTI_COMPLEX,2,lengths) 
	Status=DftiSetValue(fftjob,DFTI_NUMBER_OF_USER_THREADS,1) 
	Status=DftiCommitDescriptor(fftjob) 
	Status=DftiComputeForward(fftjob,tmp1D(((n**2)/2)*MYID+1:((n**2)/2)*(MYID+1))) 
	!#Core1 computes tmp1D(1:n**2/2) ;          
	!#Core2 computes tmp1D(n**2/2+1:n**2) 
	Status=DftiFreeDescriptor(fftjob)
	!$OMP END PARALLEL[/fxfortran]

Error still emerges while execution. I think the provided C example cannot be translated into FORTRAN. Does anyone parallel FFT in FORTRAN with OpenMP threading?
0 Kudos
barragan_villanueva_
Valued Contributor I
1,144 Views

Hi,Raemon

This C-example computes 2d FFT problems 50x100 in parallel when all data are located in 200x100 array.

So that each thread (from 0..3) computes FFT of his own data what is presented in line #31.

0 Kudos
barragan_villanueva_
Valued Contributor I
1,144 Views
Quoting Raemon

This is my current scheme:

  1. Equivalence(tmp,tmp1D)!#tmp(n,n),tmp1d(n**2)
  2. CALLomp_set_num_threads(2)
  3. CALLmkl_set_num_threads(1)
  4. lengths(1)=n/2
  5. lengths(2)=n
  6. !$OMPPARALLEL
  7. MYID=omp_get_thread_num()!#MYID=0forCore1,MYID=1forCore2
  8. write(*,*)((n**2)/2)*MYID+1,((n**2)/2)*(MYID+1)
  9. Status=DftiCreateDescriptor(fftjob,DFTI_DOUBLE,DFTI_COMPLEX,2,lengths)
  10. Status=DftiSetValue(fftjob,DFTI_NUMBER_OF_USER_THREADS,1)
  11. Status=DftiCommitDescriptor(fftjob)
  12. Status=DftiComputeForward(fftjob,tmp1D(((n**2)/2)*MYID+1:((n**2)/2)*(MYID+1)))
  13. !#Core1computestmp1D(1:n**2/2);
  14. !#Core2computestmp1D(n**2/2+1:n**2)
  15. Status=DftiFreeDescriptor(fftjob)
  16. !$OMPENDPARALLEL
Error still emerges while execution. I think the provided C example cannot be translated into FORTRAN. Does anyone parallel FFT in FORTRAN with OpenMP threading?

Well, suppose that n is even in your test (see line #5)

You want to have 2 threads but no MKL parallelization according to lines #3-4.

In line #11 you want to calculate inplace 2D FFT of the problem (n/2) x n on each thread.

Each thread creates fftjob as DFTI DESRIPTOR. To be correct it should be private.

Line #12 is not needed already because DFTI_NUMBER_OF_USER_THREADS == 1 is by default.

But if you want to share fftjob as DFTI DESRIPTOR than the value of DFTI_NUMBER_OF_USER_THREADS should be equal to number of threads.

On line #14 you should set input address to calculate FFT for each thread.

However, the length of these data must be always(n/2)* n but in your case they depend of MYID :(

Therefore,it seemsyour test calculates two different 2d FFTs (of the same problem) in n x n array, is it so?

0 Kudos
Reply