Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2456 Discussions

Can't properly enqueue tasks into a task_arena



I need to have multiple task_arenas in my application to be able to partition the available worker threads, and make sure that different components make progress independent of each-other. I also need some kind of cancellation/waiting functionality for my arenas to properly shutdown arenas; therefore I kind-of need to use task_group in the conjunction with the task_arena. (see also for more context; I created a new post, because this is focuses on a different aspect).

The typical way to enqueue tasks in your arena is something like:

myTaskArena->execute([&] {
    myTaskGroup->run([&] {
        // Do the work of the task

The problem that I have with this pattern, is that sometimes it enqueues the tasks fine, without the caller being blocked in there, and sometimes it blocks the caller into executing tasks of the arena.

Here is a test example:

static const int numThreadsInArena = 1;
static const int numTasks = 100;

tbb::task_scheduler_init defInit;
tbb::task_arena* myTaskArena = new tbb::task_arena(numThreadsInArena+1);
tbb::task_group* myTaskGroup = new tbb::task_group;

// From outside of the arena, enqueue tasks into the arena
tbb::parallel_for(tbb::blocked_range<int>(0, numTasks, 1), [&] (const tbb::blocked_range<int>& r)
    for (int i = r.begin(); i < r.end(); i++)
        // Enqueue task into the arena
        myTaskArena->execute([&, i] {
            printf("{"); fflush(stdout);
            myTaskGroup->run([&, i] {
                printf("executing task %d\n", i); fflush(stdout);
            printf("}"); fflush(stdout);
printf("\n"); fflush(stdout);

// Let the tasks start

printf("---canceling----------------------------\n"); fflush(stdout);
myTaskArena->execute([&] {
printf("---done---------------------------------\n"); fflush(stdout);

// We are done
delete myTaskArena;
delete myTaskGroup;

A good output of this test would be:

executing task 99

However, if I run the test more than once, I get some bad outputs:

{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{{}{}{}{}{}{}{}{}}{}{{}}{{}}{}{{}}{{}{}}{{}}{{}{}executing task 24
}{}{}{}{}{}{}executing task 47
executing task 46
executing task 22
executing task 21
{}executing task 39
executing task 71
{}executing task 45
executing task 0


This basically tells me that the caller (running on worker threads that are not part of the arena) will join the arena to help it with its work. Therefore, sometimes the enqueuing is not just "adding a task to be executed later on the arena", it can sometimes make the caller join the arena.

This breaks the boundary of the arena. It's hard to partition the workers in such a way that the components are not .

Some observations:

  • If I transform the parallel_for into a regular for (trying to enqueue directly from the main thread), the bad behavior doesn't happen (as often?)
  • If I remove the "+1" from the specification of the arena size (use 1 instead of 2), then none of my tasks would start executing (the 1 worker for the arena is basically reserved for masters) -- in the parallel_for case, the main thread may will block trying to execute tasks while adding them
  • Changing the task_arena constructor arguments to (1, 0) makes no difference -- parallel_for still blocks the main flow
  • Changing the task_arena constructor args to (4, 0) doesn't help -- in this case, the regular for blocks as well from time to time

It feels to me that there is a strange behavior of task_arena, when used in conjunction with task_grop. Am I missing something?

Thank you very much

0 Kudos
2 Replies

Hi Lucian,

Your observations are correct.

Let me clarify properties of task_arena:

task_arena(int max_concurrency = automatic, unsigned reserved_for_masters = 1);

  • max_concurrency means how many threads can participate in this task_arena simultaneously. This property relates to all threads that try to enter the arena: TBB worker threads and threads that enters through the execute method.
  • reserved_for_masters guarantees how many threads can enter the task_arena through the execute method. In other words, the number of TBB worker threads that can enter the task_arena implicitly is max_concurrency-reserved_for_masters.

What if the number of threads that try to enter the arena explicitly (through the execute method) is bigger than max_concurrency? In that case, the thread will enqueue its functor as a task to the task_arena and wait for its completion. However, if the thread enqueues the task and another thread leaves the arena then the first thread cannot guarantee that its task will be processed. To prevent an infinite wait, the first thread will try to enter the arena second time and, if succeed, it will start processing tasks. But the thread does not know where its task and it will process all tasks until the required tasks is completed (theoretically, it can be completed by some other thread). I.e. the thread that tries to enter the task_arena explicitly can start processing other tasks if it fails to enter at the first attempt.

Hopefully, it explains the observed behavior.


0 Kudos

Hello Alex

Thank you very much. This clarifies a lot what's happening. I was kind-of inferring these, but your explanation is really focusing on the essence.

Moving forward, I can draw the following conclusion: I cannot properly enqueue in a task_arena tasks that are cancelable, with the existing public TBB API.
With the following clarifications:

  • "properly eneueue" means that the caller thread will never join the arena; (using task_arena.execute is not an option, as you just explained)
  • existing public TBB API: without deriving from arena and change the behavior of enqueue (use protected members), or tricks like this

Is that a correct assessment?

Thank you very much,

0 Kudos