Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

parallel_for with inexplicable performance

sinedie
Beginner
244 Views
Hello,
I have tried to implement Dr. Dobb's example with parallel_for. It gives a very bad performance, a scaling of 0.03! Can anyone help fix the problem please? The file is attached.

bad.parallel_for__sum.cpp

TIA,
-S
0 Kudos
2 Replies
RafSchietekat
Valued Contributor III
244 Views
Like this, the implementation uses one task per array element, which is obviously too much overhead. Using auto_partitioner (which is not the default merely for historical reasons) with parallel_for, instead of the default simple_partitioner, should provide instant relief.

0 Kudos
sinedie
Beginner
244 Views
Quoting - Raf Schietekat
Like this, the implementation uses one task per array element, which is obviously too much overhead. Using auto_partitioner (which is not the default merely for historical reasons) with parallel_for, instead of the default simple_partitioner, should provide instant relief.

I believe you are right there. With a grainsize in the generally accepted region of 10K-100K the scale up shot upto 0.997. But that's it. In anycase probably the overhead must be too small so I guess this problem is solved. :)
Thanks Raf.
0 Kudos
Reply