I have to the develop the following project:
My data is divided into 12 2D complex float matrices.
Each complex matrix is then separated into 2 float matrices: Real and imaginary.
The reason for this: The signal processing contains also transpose.
There is no IPP transpose for complex float.
So I have 24 float matrices. Each row is consecutive in RAM.
On each marix I have to run FFT on each row, multiply each row by a constant vector (sample by sample) and then FFT on columns.
Before FFT on columns I will have to transpose because FFT on columns will be slower due to cache.
Now to my question:
Is it right to use TBB on each matrix in order to use all cores ?
I have 24 matrices but only 4 cores in my CPU.
For more complete information about compiler optimizations, see our Optimization Notice.