Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
2421 Discussions

parallel_for with inexplicable performance

sinedie
Beginner
95 Views
Hello,
I have tried to implement Dr. Dobb's example with parallel_for. It gives a very bad performance, a scaling of 0.03! Can anyone help fix the problem please? The file is attached.

bad.parallel_for__sum.cpp

TIA,
-S
0 Kudos
2 Replies
RafSchietekat
Black Belt
95 Views
Like this, the implementation uses one task per array element, which is obviously too much overhead. Using auto_partitioner (which is not the default merely for historical reasons) with parallel_for, instead of the default simple_partitioner, should provide instant relief.

sinedie
Beginner
95 Views
Quoting - Raf Schietekat
Like this, the implementation uses one task per array element, which is obviously too much overhead. Using auto_partitioner (which is not the default merely for historical reasons) with parallel_for, instead of the default simple_partitioner, should provide instant relief.

I believe you are right there. With a grainsize in the generally accepted region of 10K-100K the scale up shot upto 0.997. But that's it. In anycase probably the overhead must be too small so I guess this problem is solved. :)
Thanks Raf.
Reply