Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
6587 Discussions

Poor scaling for real-to-real FFT with OpenMP

Jyotirmoy_B_
Beginner
158 Views

In the attached file I use MKL to compute a real-to-real FFT using OpenMP for multithreading.

The code is compiled with

icpc -o bench-fft -Wall -O3 -g -march=native -fopenmp bench-fft.cxx -mkl

The machine has 4 cores.

It seems that the code does not scale well with the number of threads.

When run with

OMP_NUM_THREADS=1 ./bench-fft 4194304

the total time taken is 0.1640 user, 0.0440 sys while with

OMP_NUM_THREADS=2 ./bench-fft 4194304

the total time taken is 0.3000 user, 0.0560 sys. So there seems to be a large synchronization overhead since the total CPU time almost doubles.

Is this to be expected or am I doing something wrong in my code.

0 Kudos
1 Reply
SergeyKostrov
Valued Contributor II
158 Views
>>...Is this to be expected or am I doing something wrong in my code. Try to set KMP_AFFINITY to scatter or compact and use more OpenMP threads. In case of a Linux OS use Htop utility to verify how threads are pinned to cores.
Reply