- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the attached file I use MKL to compute a real-to-real FFT using OpenMP for multithreading.
The code is compiled with
icpc -o bench-fft -Wall -O3 -g -march=native -fopenmp bench-fft.cxx -mkl
The machine has 4 cores.
It seems that the code does not scale well with the number of threads.
When run with
OMP_NUM_THREADS=1 ./bench-fft 4194304
the total time taken is 0.1640 user, 0.0440 sys while with
OMP_NUM_THREADS=2 ./bench-fft 4194304
the total time taken is 0.3000 user, 0.0560 sys. So there seems to be a large synchronization overhead since the total CPU time almost doubles.
Is this to be expected or am I doing something wrong in my code.
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>...Is this to be expected or am I doing something wrong in my code.
Try to set KMP_AFFINITY to scatter or compact and use more OpenMP threads. In case of a Linux OS use Htop utility to verify how threads are pinned to cores.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page