Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2466 Discussions

Parallel version of a simple sum is 10 times slower

golroth
Beginner
308 Views
Hello,
I'm currently trying to learn how to use Intel TBB and I tried to write a simple short program using parralel_reduce.
However, the parallel version is 10 times slower than the sequantial one...
I really can't understand why the parallel version is that slower, could you help me figure out why please ?
Here are both of my programs :
I compiled them with g++ under Fedora 12 linux and my computer has a dual core processor.
Thank you for your help.
0 Kudos
3 Replies
Dmitry_Vyukov
Valued Contributor I
308 Views
And what version of TBB?
0 Kudos
RafSchietekat
Valued Contributor III
308 Views
"And what version of TBB?"
Dmitriy probably means (sorry...) that before TBB 2.2 the default partitioner was simple_partitioner, and to get acceptable performance with that you have to set an appropriate grainsize (third parameter of blocked_range, e.g., 1000), or otherwise you must explicitly specify auto_partitioner as the thrd argument of parallel_reduce() (the default since TBB 2.2).
0 Kudos
golroth
Beginner
308 Views
Indeed, I am using the 2.1 version and I did not specify neither a partitioner, nor a grainsize.
It works very well now :)
Thank you very much for your help :D
0 Kudos
Reply