Intel TBB scheduler advantages

Zhe_F_ · ‎11-06-2007

hi,

After reading through the TBB document, I reached the following conclusion.

"For an application that all the threads are allocated when it starts and all the threads' lifespans are the same as the process, TBB will performan no better than native threads. "

Is this a correct statement?

I guess I don't fully understand the benefits of task stealing in the TBB scheduler. Why is it better than the static scheduler of Windows?

TBB's scheduler achieves better load balance. The Windows static scheduler (round-robin) will always run underneath the TBB library's "dynamic" scheduler. The threads in the TBB thread pool will still be served in round-robin fashion even though the tasks won't be preempted.

Could anybody clarify on this issue?

Thanks!!!

TimP · ‎11-06-2007

You might get quicker response to questions about TBB on the TBB forum. I agree, it seems unlikely that TBB would offer any scheduling advantage in those cases where static scheduling is appropriate.

robert-reed · ‎11-12-2007

"For an application that all the threads are allocated when it starts and all the threads' lifespans are the same as the process, TBB will performan no better than native threads. "

There's no right answer to this question because the answer will be true or false depending on a large collection of conditions attached to the resultant code. But there are some advantages that TBB provides as infrastructure that amount to more than just the provision of a thread pool.

I will say this: someone skilled in the programming of multiple threads in an SMP architecture with a hierarchy of latencies in the memory/cache structure can achieve all the gains available via TBB and more, using only the native threads interface. And there are applications currently for whichTBB would be inappropriate, but native threads would not.

Nevertheless, what TBB does package provides programmers more conveniences for writing efficient threaded code than come with native threads; TBB is more than just a native threads wrapper.

Start with the thread pool. Windows offers QueueUserWorkItemto dispatch tasks into a thread pool--nothing equivalent in portable p-threads versions--but there's no control once the work has been scheduled; no way to know when a set of work tasks are completed, certainly no way to partition the work once it's been scheduled.

Thread scheduling ala TBB offers all that and more. The natural splitting of parallel loops using blocked ranges narrow the range of addresses the loop may be playing with, offer opportunities to improve cache utilization, and the same splitting can improve load balance by avoiding situations where one thread is stuck finishing one long task while the rest sit idle. This localization in cache has provided instances where TBB code runs faster than the non TBB version EVEN USING A SINGLE THREAD!

What about the eventual trashing of context by the Windows static task scheduler? Well, Windows thread scheduling is premptive, priority-based and round-robin AT THE HIGHEST PRIORITY. Moreover, the Windows scheduler "tries to keep a thread on its ideal processor/node to avoid performance degradation of cache/NUMA-memory." The eventual concern about being bounced by the thread time slice is only a concern if you've got more threads than processing elements to run them on. If you keep within those limits as happens automatically with TBB, the worse you'll suffer is priority degradation for processor-bound threads; but if all your threads are doing it, who's going to be evicted?

(As was noted earlier, there is a separate forum for TBB where questions such as these might be addressed more quickly.)