Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
2421 Discussions

Executing a Collection of tbb::tasks in parallel

New Contributor I
Hey all,

Quick question... I have a collection of tbb::task objects that I want to execute in parallel, what is the best method to do this? In this case I'm attempting to use tbb::task objects for delayed execution, so when I reach a certain condition I have a large number of tbb::tasks that may be executed in parallel. Should I just write a function that uses the task scheduler to make them all children of a single parent? That would affect task-stealing however wouldn't it?

0 Kudos
4 Replies

The most efficient way to spawn multiple tasks at once is to use task_list. It's how you would usually spawn several child tasks at once. You should be able to figure out exactly how to do that from the the documentation and header files.

There are some restrictions however, enforced by assertions. In particular, every task in a task list should be of the same depth.Also please be aware that parent-child relation is set at the moment you allocate atask object, not at the moment you spawn it.

AndI agree with your concerns about performance being affected.In fact, in this case performance would be affected by design; you want to spawn a lot of tasks at once, and they all will reside in the same pool (of the current thread) so stealingis serialized. And if all tasks are allocated as children of the same parent, the parent's reference counter will be a hot spot that every task updates.

Generally with TBB, it is preferrable to spawn new tasks as soon as they are ready and safe to execute, and spawn more child tasks from those to build a task tree. If you need to collect the data first, and then start processing per some condition, I would suggest to collect the data in a vector (or concurrent_vector), and when the moment comes, start parallel_for loop over the elements of the vector.

New Contributor I
Okay, in this situation I have a collection of tasks which have been blocked by some former condition, and are suddenly ready to execute because that condition has been changed. In my case, this condition is the advance of a virtual clock to the next event time in a simulation.

Do you have any other suggestions for methods to improve performance with the scheduler?
Black Belt
Just make sure the tasks are really about task parallelism (but then why are there so many?) and not data parallelism (disabling TBB's forte of dividing work in a cache-friendly way, if relevant), and try to organise them as a tree (with siblings executing unrelated code for better code-related locality, if easily done, because I have no idea how effective that would be) while you are creating them (also, or mainly, to avoid an administrative bottleneck in the scheduler as Alexey indicated). The advice I've read is to create tasks in a parallel way (which may or may not be a competing force with ensuring code-related locality, however relevant), but maybe it's worth exploring an object that automatically tree-ifies a list or builds a tree as items are being added (only if you've got time to spare, and subject to what someone like Alexey might think of that).

Alexey mentioned using "a vector (or concurrent_vector)", but in an earlier topic I edited out a fleeting suggestion of my own of using a std::vector, because there don't seem to be any guarantees of thread safety even for accessing a vector (unlike parallel_for over a concurrent_hash_map, which also must not be being modified concurrently but can be accessed in parallel), so wouldn't that be very much at your own risk?

New Contributor I
In this case, the application is for the master data structures of YetiSim (a discrete-event simulation framework).

At a high level, there is a master dictionary which maps time values to a list of independent events to execute when the simulation clock reaches that time. Once that time is reached, there is a sudden burst of work to complete.