Community
cancel
Showing results for 
Search instead for 
Did you mean: 
clodxp
Beginner
80 Views

OPENMP and MKL FFT: strange behavior

Jump to solution
Hi all!
I've collected an example wherein a VERY STRANGE BEHAVIOR happens: the use of an FFT within an OpenMP cycle, with OMP_NUM_THREADS=1, seems to go about 10 time faster than the serial version!!

The code essentially is made by a cycle wherein an FFT is performed. I would like that each thread would perform a part of the M FFTS to be computed.


CYCLE SERIAL VERSION -------------------------

do xx=1,M

!initalize data to be transformed
data_fft=xx+imag*xx*2.

! Perform FFT
Status=DftiComputeForward(Desc_Handle,data_fft)

!perform sum of the elements
sum_vect(xx)=sum(data_fft)

end do
!----------------------------------------------------------



I've parallelized this cycle in two version: correct and uncorrect.

In the first one, the correct one (see FFT_2D_openmp.f90 attached) the cycle is parallelized as follows

CYCLE PARALLEL VERSION -----------------------------
!$omp parallel
!$omp do private(data_fft) schedule(static,num_it_schedule)
do xx=1,M

!initalize data to be transformed
data_fft=xx+imag*xx*2.

! Perform FFT
Status=DftiComputeForward(Desc_Handle,data_fft)

!perform sum of the elements
sum_vect(xx)=sum(data_fft)

end do
!$omp end do
!$omp end parallel
--------------------------------------------------------------


To obtain a correct functioning it is necessary to set DFTI_NUMBER_OF_USER_THREADS=to the number of running threads.

OK!The correct version has been obtained after the wrong on (see FFT_2D_openmp_wrong.f90), showing the strange behavior.
In the wrong version i've made an error, since a declared as private in the cycle also the status and handle of the FFT:

CYCLE PARALLEL VERSION (WRONG!!) -----------------------------------------------------
!$omp parallel
!$omp do private(data_fft,Status,Desc_Handle) schedule(static,num_it_schedule)
do xx=1,M

!initalize data to be transformed
data_fft=xx+imag*xx*2.

! Perform FFT
Status=DftiComputeForward(Desc_Handle,data_fft)

!perform sum of the elements
sum_vect(xx)=sum(data_fft)

end do
!$omp end do
!$omp end parallel
---------------------------------------------------------------------------------------------------


Unfortunately, the result (the sum of the element of sum_vect) is correct (if compared to the results of the serial version) and the time is about 10 time lower!!!

This is the execution of FFT_2D_openmp_wrong.f90 on my machine (Mac Pro 8-core).

---------- S t a r t
+ Matrix size Nx,Ny = 300.0000 300.0000
+ Cycle over M = 100.0000
--> SERIAL
+ Serial execution time = 0.120011000006343
+ Serial result (sum) = (4.5450000E+08,9.0900000E+08)
--> PARALLEL
+ Number of threads = 1
+ Parallel execution time = 9.684999997261912E-003
+ Parallel result (sum) = (4.5450000E+08,9.0900000E+08)
--> SPEEDUP (ideal = nthread) = 12.3914300506218
--> EFFICIENCY (ideal =1) = 12.3914300506218
---------- S t o p


The parallel execution time is about 12 time lower than the serial one, while a correct working is obtained with FFT_2D_openmp.f90.

Can someone explain this??!
And please can confirm the correct use of the MKL FFT for my needs??

Thanks

Clodxp




0 Kudos
1 Solution
Evgueni_P_Intel
Employee
80 Views

Hi Clodxp,

The initial value of a private variable in an OpenMP sectionis undefined (FORTRAN OpenMP 2.0 http://www.openmp.org/mp-documents/fspec20.pdf, p. 35; OpenMP 3.0 http://www.openmp.org/mp-documents/spec30.pdf, p.90).
Hence each call to DftiComputeForward in the parallel part of FFT_2D_openmp_wrong.f90 returns DFTI_BAD_DESCRIPTOR and doesn't change the input data.

Ironically, we can't catch this error by checking, as in FFT_2D_openmp_wrong.f90, sums for a constant signal v=(v[1], v[2], ..., v)where v =v=c for all i and j, because FFT(v) = (c*N, 0, 0, ..., 0) in this case.
If DftiComputeForward succeeds, then v is replaced with FFT(v) and we get sum(FFT(v)) = c*N.
If DftiComputeForward fails, then v isn't changed and we getsum(v) = c*N.
Hence you see the same sum in the sequential and "parallel" case...

You may find useful the following Knowledge Base articlehttp://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft/about parallelization of (2D) FFTs.
Given only FFT_2D_openmp_wrong.f90 and in FFT_2D_openmp.f90, it's hard to tell what are your needs and what would be the correct use of MKL FFT for you.

View solution in original post

2 Replies
Evgueni_P_Intel
Employee
81 Views

Hi Clodxp,

The initial value of a private variable in an OpenMP sectionis undefined (FORTRAN OpenMP 2.0 http://www.openmp.org/mp-documents/fspec20.pdf, p. 35; OpenMP 3.0 http://www.openmp.org/mp-documents/spec30.pdf, p.90).
Hence each call to DftiComputeForward in the parallel part of FFT_2D_openmp_wrong.f90 returns DFTI_BAD_DESCRIPTOR and doesn't change the input data.

Ironically, we can't catch this error by checking, as in FFT_2D_openmp_wrong.f90, sums for a constant signal v=(v[1], v[2], ..., v)where v =v=c for all i and j, because FFT(v) = (c*N, 0, 0, ..., 0) in this case.
If DftiComputeForward succeeds, then v is replaced with FFT(v) and we get sum(FFT(v)) = c*N.
If DftiComputeForward fails, then v isn't changed and we getsum(v) = c*N.
Hence you see the same sum in the sequential and "parallel" case...

You may find useful the following Knowledge Base articlehttp://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft/about parallelization of (2D) FFTs.
Given only FFT_2D_openmp_wrong.f90 and in FFT_2D_openmp.f90, it's hard to tell what are your needs and what would be the correct use of MKL FFT for you.

View solution in original post

clodxp
Beginner
80 Views
Thank you very much!!!
Private variables are not initialized!! I always forget about it!

Clodxp
Reply