Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

30 performance

I'm trying to compare the performance of a clustering code that I wrote in tbb with a widely used openmp
version. Currently the openmp version appears faster, and I'm trying to figure out why
As part of this investigation I switched from 2.2 to 3.0 (on the theory that newer is always better)
Oddly the 3.0 version seems to be slower.
I running vista 32 bit (for reasons that I care not to explain)
on 2.2 the 8 core machine shows 100% cpu busy while 3.0 seems to drop down to around 50%
I tried playing around with grain sizes and such which did move the cpu utilization up to around 80% but still less then 2.2
Does all this mean, I should go back to 2.2, because everyone knows 3.0 is not yet tuned for performance or
(and more likely?) did I mess up the install?
Has anyone else seen this performance drop? I'm using prebuilt binaries so at least I didn't build it wrong.
Thanks in advance for any help
0 Kudos
1 Reply
New Contributor II
Hi, could you provide details on how TBB is used in the application: what types of parallel algorithms are used and if they're nested, how much work does each function object perform and if there's any I/O, synchronization, etc.? In case we're talking about a simple parallel_for, could you provide a reproducer for this 50% CPU utilization case? Thanks.
0 Kudos