parallel_for with inexplicable performance

sinedie · ‎05-24-2009

Hello,
I have tried to implement Dr. Dobb's example with parallel_for. It gives a very bad performance, a scaling of 0.03! Can anyone help fix the problem please? The file is attached.

bad.parallel_for__sum.cpp

TIA,
-S

RafSchietekat · ‎05-25-2009

Like this, the implementation uses one task per array element, which is obviously too much overhead. Using auto_partitioner (which is not the default merely for historical reasons) with parallel_for, instead of the default simple_partitioner, should provide instant relief.

sinedie · ‎05-26-2009

Quoting - Raf Schietekat

Like this, the implementation uses one task per array element, which is obviously too much overhead. Using auto_partitioner (which is not the default merely for historical reasons) with parallel_for, instead of the default simple_partitioner, should provide instant relief.

I believe you are right there. With a grainsize in the generally accepted region of 10K-100K the scale up shot upto 0.997. But that's it. In anycase probably the overhead must be too small so I guess this problem is solved. :)
Thanks Raf.