Community
cancel
Showing results for 
Search instead for 
Did you mean: 
AndrewC
New Contributor I
154 Views

MKL FFT inside OpenMP loop (MKL 2018)

 

I have an openmp loop

#pragma openmp parallel for

for (int i=0;i<n;i++){

// routine that calls MKL FFT

}

The thread performance is pretty abysmal, on an 8 core machine, showing just over 1 core being used.

What is surprising  is that Intel Amplifier shows that the time is spent in DftiCommitDescriptor, not the actual computation.

Function / Call Stack    CPU Time    Module    Function (Full)    Source File    Start Address
DftiCommitDescriptor    83.7%    mkl_rt.dll    DftiCommitDescriptor    [Unknown]    0x180a45b68

.....
DftiComputeForward    0.5%    mkl_rt.dll    DftiComputeForward    [Unknown]    0x180a45f10

Any suggested best practices here. typically the FFT function will be called with the same data length, say ,10K-20K..

 

0 Kudos
4 Replies
Ying_H_Intel
Employee
154 Views

Hi vasci_ 

How do you link mkl and the FFT is 1D or 2D?  If it is intel compiler and openmp, the code in parallel loop is supposed be run in serial.

According to "typically the FFT function will be called with the same data length​", You may  try put the DftiCommitDescriptor​ out of the openmp for loop and see if there any improvements.
​or if needed, please submit one reproduce case to  Online service center https://supporttickets.intel.com/?lang=en-US?

Moreover, MKL user guides have several using  FFT in openmp parallel sample code for your reference:
https://software.intel.com/en-us/mkl-developer-reference-c-examples-of-using-openmp-threading-for-ff...

Best Regards,

​Ying

AndrewC
New Contributor I
154 Views

This a 1D computation. And after changing the code to serial, DftiCommitDescriptor was still the bottleneck. Clearly moving the DftiCommitDescriptor outside of the loop would help - it is just a surprising result that DftiCommitDescriptor is so 'expensive' 

AndrewC
New Contributor I
154 Views

Related to this I have found that after updating to MKL 2018 Update 2 and when a 1-D FFT is being called in a OpenMP parallel for loop I am getting a memory access exception.

The crash is deep inside mkl_avx.dll.

Removing the openmp directives stops the issue.

AndrewC
New Contributor I
154 Views

Following on my previous post. This is a typical crash occurring in Update 2 but not Update 1. 

Basically I have to remove all FFT calls within OpenMP parallel regions to avoid these crashes.

 

CS = 0033   FS = 0053   GS = 002b

Stack Trace (from fault):
[  0] 0x000007fed1e21b2a   mkl_avx.dll+09181994 mkl_dft_avx_dft_zdscal+00000842
[  1] 0x000007fed1fbcd9f   mkl_avx.dll+10866079 mkl_sparse_d_csr_ctd_sv_ker_i8_avx+00578415
[  2] 0x000007fed1e234c8   mkl_avx.dll+09188552 mkl_dft_avx_dfti_create_node+00000488
[  3] 0x000007fed1e23af9   mkl_avx.dll+09190137 mkl_dft_avx_dfti_create_sr1d+00000073
[  4] 0x000007fee03d75d2    mkl_rt.dll+10909138 fftwf_sprint_plan+00001134
[  5] 0x000007fee03bfe9a    mkl_rt.dll+10813082 DftiCreateDescriptor_s_1d+00000366
....
[  8] 0x000007fee5330ecc libiomp5md.dll+00593612 _kmp_invoke_microtask+00000140
[  9] 0x000007fee52fc37d libiomp5md.dll+00377725 _kmp_acquire_nested_drdpa_lock+00037421
[ 10] 0x000007fee52fb494 libiomp5md.dll+00373908 _kmp_acquire_nested_drdpa_lock+00033604
[ 11] 0x000007fee5332e87 libiomp5md.dll+00601735 _kmp_launch_worker+00000407
[ 12] 0x00000000773859cd                   C:\Windows\system32\kernel32.dll+00088525 BaseThreadInitThunk+00000013
[ 13] 0x00000000775ba561                      C:\Windows\SYSTEM32\ntdll.dll+00173409 RtlUserThreadStart+00000033

Reply