Is there any documentation available to help with troubleshooting performance issues when using TBB? (I haven't yet got the O'Reilly book, so that might be the short answer.)
I'm trying to use parallel_for and auto_partitioner to parallelise some existing calculation code under Windows XP. The modified code runs fine and Task Manager shows that all cores are being used throughout the calculation, but the runtime is slightly longer than the single-threaded version (the PC has two cores- one core plus hyperthreading).
Running under a profiler and looking at the wait_for_all function, roughly half the time is being spend inside my body function but half is "inside" the windows Sleep function, which is being called 200,000 of times compared to 93 calls to my body function.
The range is not completely uniform: it has 252 elements (taking just over a second) of which one block of 145 elements take 95% of the time.
Changing allocator, grainsize, partitioner and debug/release build doesn't seem to affect these relative times much, and DO_TBB_ASSERT doesn't flag any problems. Even runningon a four core PC gives no speed-up. I've generated a TBB_TRACE file but not sure how to interpret the output.
Any pointers welcome. I suspect the answer is (a) read the book and (b) build up / reduce to a simple example, but it's frustrating being so close and so I hoped someone might recognise the symptoms and be able to tell me what I've done wrong.
From what you wrote, I would say there is serious load imbalance in the loop. "Half time inside the windows Sleep function" actually means workers did not succeed in finding job to steal (Sleep(0) is called to yield CPU time if several attempts to steal some job had no result). Might it happen that in reality 90+ % of the time is spend for processing 1-5 elements andnot evenly distributed across 145?
There are two things I would do:
Using these two options together, you might get the exact time spend at each iteration. Given the number of iterations, it is feasible to obtain, and analyze load balancing then.