[cpp]int grainsize = m_numCentroids / numThreads; grainsize = grainsize > 0 ? grainsize : 1; parallel_for(blocked_range
(0, m_numCentroids, grainsize), destinationRunner, tbb::simple_partitioner()); [/cpp]
[bash] //Value middle = r.my_begin + (r.my_end-r.my_begin)/2u; // blocked_range Value new_end = r.my_begin + r.my_grainsize; // explicit_range[/bash]Obvsiosly this wont work well unless your grainsize is equal to your number of work elements divided by your number of threads but as I'm controlling both these parameters explicitly it should generate suitable speed improvements (and is).
Actually, you don't need static partitioning for either of this.
To control the number of participating threads, use task_scheduler_init class.
To ensure TBB scales well with the growing number of threads, create enough of parallel slack (i.e. orders of magnitude more tasks / iteration subranges than there are threads).
What we usually do in scalability studies like that is to throttle the number of threads via task_scheduler_init, and just use default parameters of parallel_for. In many benchmarks itscaled nearly linear. And when it did not, overhead on creation of parallel slack was rare the reason (in particular because default settings of parallel loops in TBB 2.2 are chosen to avoid unnecessary task creation).
Well, I guess I was notabsolutely correct saying about orders of magnitude more tasks :) The point is that you should allow it, but not create manually.And you don't even have to do anything for that; TBB default settings for parallel_for were selected to avoid excessive task creation while keeping good load balance. There are corner cases where it leaves enough performance on the table, but in most cases, it works just fine. Particularly, I think you'd be surprised if you count Body objects created during the run; it should be much less than the number of iterations in the loop.
These explanations are not to convert you into my belief :) but rather to let other readersknow that what you do is usually unnecessary. The primary idea of TBB is to make its users care *less* about work partitioning, job scheduling, thread pool management etc. - ideally, not care at all - and still get good performance and scalability.