Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- OPENMP and MKL FFT: strange behavior

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

clodxp

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-10-2010
09:25 AM

80 Views

OPENMP and MKL FFT: strange behavior

I've collected an example wherein a VERY STRANGE BEHAVIOR happens: the use of an FFT within an OpenMP cycle, with OMP_NUM_THREADS=1, seems to go about 10 time faster than the serial version!!

do xx=1,M

!initalize data to be transformed

data_fft=xx+imag*xx*2.

! Perform FFT

Status=DftiComputeForward(Desc_Handle,data_fft)

!perform sum of the elements

sum_vect(xx)=sum(data_fft)

end do

!----------------------------------------------------------

I've parallelized this cycle in two version: correct and uncorrect.

In the first one, the correct one (see FFT_2D_openmp.f90 attached) the cycle is parallelized as follows

!$omp parallel

!$omp do private(data_fft) schedule(static,num_it_schedule)

do xx=1,M

!initalize data to be transformed

data_fft=xx+imag*xx*2.

! Perform FFT

Status=DftiComputeForward(Desc_Handle,data_fft)

!perform sum of the elements

sum_vect(xx)=sum(data_fft)

end do

!$omp end do

!$omp end parallel

--------------------------------------------------------------

To obtain a correct functioning it is necessary to set DFTI_NUMBER_OF_USER_THREADS=to the number of running threads.

OK!The correct version has been obtained after the wrong on (see FFT_2D_openmp_wrong.f90), showing the strange behavior.

In the wrong version i've made an error, since a declared as private in the cycle also the status and handle of the FFT:

!$omp parallel

!$omp do private(data_fft,Status,Desc_Handle) schedule(static,num_it_schedule)

do xx=1,M

!initalize data to be transformed

data_fft=xx+imag*xx*2.

! Perform FFT

Status=DftiComputeForward(Desc_Handle,data_fft)

!perform sum of the elements

sum_vect(xx)=sum(data_fft)

end do

!$omp end do

!$omp end parallel

---------------------------------------------------------------------------------------------------

Unfortunately, the result (the sum of the element of sum_vect) is correct (if compared to the results of the serial version) and the time is about 10 time lower!!!

This is the execution of FFT_2D_openmp_wrong.f90 on my machine (Mac Pro 8-core).

+ Matrix size Nx,Ny = 300.0000 300.0000

+ Cycle over M = 100.0000

--> SERIAL

+ Serial execution time = 0.120011000006343

+ Serial result (sum) = (4.5450000E+08,9.0900000E+08)

--> PARALLEL

+ Number of threads = 1

+ Parallel execution time = 9.684999997261912E-003

+ Parallel result (sum) = (4.5450000E+08,9.0900000E+08)

--> SPEEDUP (ideal = nthread) = 12.3914300506218

--> EFFICIENCY (ideal =1) = 12.3914300506218

---------- S t o p

The parallel execution time is about 12 time lower than the serial one, while a correct working is obtained with FFT_2D_openmp.f90.

Can someone explain this??!

And please can confirm the correct use of the MKL FFT for my needs??

Thanks

Clodxp

1 Solution

Evgueni_P_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-11-2010
02:02 AM

80 Views

The initial value of a private variable in an OpenMP sectionis undefined (FORTRAN OpenMP 2.0 http://www.openmp.org/mp-documents/fspec20.pdf, p. 35; OpenMP 3.0 http://www.openmp.org/mp-documents/spec30.pdf, p.90).

Hence each call to DftiComputeForward in the parallel part of FFT_2D_openmp_wrong.f90 returns DFTI_BAD_DESCRIPTOR and doesn't change the input data.

Ironically, we can't catch this error by checking, as in FFT_2D_openmp_wrong.f90, sums for a *constant *signal v=(v[1], v[2], ..., v* =v =c for all i and j, because FFT(v) = (c*N, 0, 0, ..., 0) in this case.*

If DftiComputeForward succeeds, then v is replaced with FFT(v) and we get sum(FFT(v)) = c*N.

If DftiComputeForward fails, then v isn't changed and we getsum(v) = c*N.

Hence you see the same sum in the sequential and "parallel" case...

You may find useful the following Knowledge Base articlehttp://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft/about parallelization of (2D) FFTs.

Given only FFT_2D_openmp_wrong.f90 and in FFT_2D_openmp.f90, it's hard to tell what are your needs and what would be the correct use of MKL FFT for you.

Link Copied

2 Replies

Evgueni_P_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-11-2010
02:02 AM

81 Views

The initial value of a private variable in an OpenMP sectionis undefined (FORTRAN OpenMP 2.0 http://www.openmp.org/mp-documents/fspec20.pdf, p. 35; OpenMP 3.0 http://www.openmp.org/mp-documents/spec30.pdf, p.90).

Hence each call to DftiComputeForward in the parallel part of FFT_2D_openmp_wrong.f90 returns DFTI_BAD_DESCRIPTOR and doesn't change the input data.

Ironically, we can't catch this error by checking, as in FFT_2D_openmp_wrong.f90, sums for a *constant *signal v=(v[1], v[2], ..., v* =v =c for all i and j, because FFT(v) = (c*N, 0, 0, ..., 0) in this case.*

If DftiComputeForward succeeds, then v is replaced with FFT(v) and we get sum(FFT(v)) = c*N.

If DftiComputeForward fails, then v isn't changed and we getsum(v) = c*N.

Hence you see the same sum in the sequential and "parallel" case...

You may find useful the following Knowledge Base articlehttp://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft/about parallelization of (2D) FFTs.

Given only FFT_2D_openmp_wrong.f90 and in FFT_2D_openmp.f90, it's hard to tell what are your needs and what would be the correct use of MKL FFT for you.

clodxp

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-11-2010
02:50 AM

80 Views

Thank you very much!!!

Private variables are not initialized!! I always forget about it!

Clodxp

Private variables are not initialized!! I always forget about it!

Clodxp

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.