Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

why my tbb program doesn't speed up


I have this part of code in my program:

tbb::parallel_for(tbb::blocked_range<int>(0, NumberY,NumberY/6 ), [&] (const tbb::blocked_range<int> &r) -> void{
              for (int iy=r.begin(); iy<r.end(); iy++){
                 int x_loc = x_left;
                for (int ix=0; ix<NumberX; ix++){
                        MyFunction(x_loc, intensity_value);
                        pfDensity[iy*NumberX+ix]  += intensity_value * mvp_idx;
                        x_loc += delta_x;
 } ); // parallel_for

NumberY and NumberX are around 8000. When I run in single thread, it runs 2 times faster than running in multithreading using TBB. I have tried to adjust grain size, or init tbb first. none of them helps.

This is a function apply to a matix, MyFunction is an interpolate function. I think it doesn't speed up as the load is too small. But this is as much as I can divide the work and I still would like something running faster than the single thread code.

How can I improve the performance here?


0 Kudos
0 Replies