Community
cancel
Showing results for 
Search instead for 
Did you mean: 
golroth
Beginner
31 Views

Parallel version of a simple sum is 10 times slower

Hello,
I'm currently trying to learn how to use Intel TBB and I tried to write a simple short program using parralel_reduce.
However, the parallel version is 10 times slower than the sequantial one...
I really can't understand why the parallel version is that slower, could you help me figure out why please ?
Here are both of my programs :
I compiled them with g++ under Fedora 12 linux and my computer has a dual core processor.
Thank you for your help.
0 Kudos
3 Replies
Dmitry_Vyukov
Valued Contributor I
31 Views

And what version of TBB?
RafSchietekat
Black Belt
31 Views

"And what version of TBB?"
Dmitriy probably means (sorry...) that before TBB 2.2 the default partitioner was simple_partitioner, and to get acceptable performance with that you have to set an appropriate grainsize (third parameter of blocked_range, e.g., 1000), or otherwise you must explicitly specify auto_partitioner as the thrd argument of parallel_reduce() (the default since TBB 2.2).
golroth
Beginner
31 Views

Indeed, I am using the 2.1 version and I did not specify neither a partitioner, nor a grainsize.
It works very well now :)
Thank you very much for your help :D