Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

thread versus task


I plan to start off a fixed finite number of operations.  Each of these operations will have a lot of computational work and opportunity nested parallelism in things such as for loops and such.    I like TBB because it handles the nested parallelism automatically - I can start up 2, 4, 8 . . . operations and the nested parallelism can then invoke the remainder of the threads in the tbb thread pool for work in a parallel for.      I am considering two methods to invoke the operations - one is to use a TBB thread for each operation -  (, the other is to create a TBB task for each operation (  Which would be better suited?   I may have a parallel reduction in the middle of the operations which should be across ALL tbb threads (not just a subgroup).    Which is easier to implement?  What else should I be considering - or what factors would affect the decision that I did not list here?

Thanks for discussion of values of one option over the other?

0 Kudos
1 Reply

Hi David,

The are several ideas that should be always considered when choosing thread based versus task based approach.

  • If you have only a computation work without IO operations (or similar ones) that can waste processor time just for waiting for some external activity, it is recommended using task-based approach. The TBB task scheduler creates its own thread pool to utilize the processor as much as possible. Some additional threads usually is not a big issue; however, running too many threads can lead to oversubscription and overall performance degradation;
  • Usually, it is recommended to run as much work on outer level parallelism as possible. So if you have trade-offs to run work on outer-level or nested level, it is better to run on outer level to reduce possible overheads of work stealing approach used by TBB task scheduler;
  • If applicable it is better to use high level parallel algorithms such parallel_for or parallel_reduce because they use tree-based approach to create and spawn tasks;
  • If your algorithm cannot be parallelized with high level parallel algorithms, it is better to use task_group, parallel_invoke and/or flow graph approach instead of explicit tbb::task interface due to its complexity.
  • It is better to run/spawn tasks from different tasks (not from one or few ones) to avoid producer/consume approach because the work-stealing approach may be inefficient in this case;
  • Try to avoid thinking about threads when using task based approach. Create as much reasonably sized tasks as possible or use high level parallel algorithm with auto_partitioner (the default one).

I hope this ideas are reasonable for your algorithm. Feel free to share the details of the algorithm that I can suggest better approach for you.

Regards, Alex

0 Kudos