- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have an openmp loop
#pragma openmp parallel for
for (int i=0;i<n;i++){
// routine that calls MKL FFT
}
The thread performance is pretty abysmal, on an 8 core machine, showing just over 1 core being used.
What is surprising is that Intel Amplifier shows that the time is spent in DftiCommitDescriptor, not the actual computation.
Function / Call Stack CPU Time Module Function (Full) Source File Start Address
DftiCommitDescriptor 83.7% mkl_rt.dll DftiCommitDescriptor [Unknown] 0x180a45b68
.....
DftiComputeForward 0.5% mkl_rt.dll DftiComputeForward [Unknown] 0x180a45f10
Any suggested best practices here. typically the FFT function will be called with the same data length, say ,10K-20K..
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi vasci_
How do you link mkl and the FFT is 1D or 2D? If it is intel compiler and openmp, the code in parallel loop is supposed be run in serial.
According to "typically the FFT function will be called with the same data length", You may try put the DftiCommitDescriptor out of the openmp for loop and see if there any improvements.
or if needed, please submit one reproduce case to Online service center https://supporttickets.intel.com/?lang=en-US?
Moreover, MKL user guides have several using FFT in openmp parallel sample code for your reference:
https://software.intel.com/en-us/mkl-developer-reference-c-examples-of-using-openmp-threading-for-fft-computation
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This a 1D computation. And after changing the code to serial, DftiCommitDescriptor was still the bottleneck. Clearly moving the DftiCommitDescriptor outside of the loop would help - it is just a surprising result that DftiCommitDescriptor is so 'expensive'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Related to this I have found that after updating to MKL 2018 Update 2 and when a 1-D FFT is being called in a OpenMP parallel for loop I am getting a memory access exception.
The crash is deep inside mkl_avx.dll.
Removing the openmp directives stops the issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Following on my previous post. This is a typical crash occurring in Update 2 but not Update 1.
Basically I have to remove all FFT calls within OpenMP parallel regions to avoid these crashes.
CS = 0033 FS = 0053 GS = 002b
Stack Trace (from fault):
[ 0] 0x000007fed1e21b2a mkl_avx.dll+09181994 mkl_dft_avx_dft_zdscal+00000842
[ 1] 0x000007fed1fbcd9f mkl_avx.dll+10866079 mkl_sparse_d_csr_ctd_sv_ker_i8_avx+00578415
[ 2] 0x000007fed1e234c8 mkl_avx.dll+09188552 mkl_dft_avx_dfti_create_node+00000488
[ 3] 0x000007fed1e23af9 mkl_avx.dll+09190137 mkl_dft_avx_dfti_create_sr1d+00000073
[ 4] 0x000007fee03d75d2 mkl_rt.dll+10909138 fftwf_sprint_plan+00001134
[ 5] 0x000007fee03bfe9a mkl_rt.dll+10813082 DftiCreateDescriptor_s_1d+00000366
....
[ 8] 0x000007fee5330ecc libiomp5md.dll+00593612 _kmp_invoke_microtask+00000140
[ 9] 0x000007fee52fc37d libiomp5md.dll+00377725 _kmp_acquire_nested_drdpa_lock+00037421
[ 10] 0x000007fee52fb494 libiomp5md.dll+00373908 _kmp_acquire_nested_drdpa_lock+00033604
[ 11] 0x000007fee5332e87 libiomp5md.dll+00601735 _kmp_launch_worker+00000407
[ 12] 0x00000000773859cd C:\Windows\system32\kernel32.dll+00088525 BaseThreadInitThunk+00000013
[ 13] 0x00000000775ba561 C:\Windows\SYSTEM32\ntdll.dll+00173409 RtlUserThreadStart+00000033

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page