Community
cancel
Showing results for 
Search instead for 
Did you mean: 
sinedie
Beginner
40 Views

parallel_for with inexplicable performance

Hello,
I have tried to implement Dr. Dobb's example with parallel_for. It gives a very bad performance, a scaling of 0.03! Can anyone help fix the problem please? The file is attached.

bad.parallel_for__sum.cpp

TIA,
-S
0 Kudos
2 Replies
RafSchietekat
Black Belt
40 Views

Like this, the implementation uses one task per array element, which is obviously too much overhead. Using auto_partitioner (which is not the default merely for historical reasons) with parallel_for, instead of the default simple_partitioner, should provide instant relief.

sinedie
Beginner
40 Views

Quoting - Raf Schietekat
Like this, the implementation uses one task per array element, which is obviously too much overhead. Using auto_partitioner (which is not the default merely for historical reasons) with parallel_for, instead of the default simple_partitioner, should provide instant relief.

I believe you are right there. With a grainsize in the generally accepted region of 10K-100K the scale up shot upto 0.997. But that's it. In anycase probably the overhead must be too small so I guess this problem is solved. :)
Thanks Raf.
Reply