Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
Intel Customer Support will be observing the Martin Luther King holiday on Monday, Jan. 17, and will return on Tues. Jan. 18.
For the latest information on Intel’s response to the Log4j/Log4Shell vulnerability, please see Intel-SA-00646
2407 Discussions

Unexpected thread distribution when one of several arenas has a long-running task



I'm seeing something unexpected in this scenario:

  • 8 threads are available in the market
  • I have 3 arenas each with 2 + 1 slots, i.e. 3 slots with one reserved for a master thread
  • one arena is busy with a single long-running task

In this scenario it seems that the number of available slots in each arena is reduced by 1. Specifically, if I use task_arena::execute() to schedule work into an idle arena only 2 threads take part (I guess the master thread and one TBB thread).

Below is a short test to reproduce the problem. My environment is:

  • 8-core PC
  • Windows 10
  • Visual Studio 2017
  • TBB 2020 update 3
#include <iostream>
#include <tbb/parallel_sort.h>
#include <tbb/task_arena.h>
#include <tbb/global_control.h>
#include <atomic>
#include <vector>
#include <numeric>

int main()
    // Initialise the market with 8 TBB threads.
    tbb::global_control globalControl(tbb::global_control::max_allowed_parallelism, 8);

    // Create three arenas, each with 2 slots for TBB threads and one slot for a master thread
    std::vector<tbb::task_arena> arenas = { {3, 1}, {3, 1}, {3, 1} };

    // Create some data to be sorted
    std::vector<int> source( 100'000 );
    std::iota( source.begin(), source.end(), 0 );

    std::atomic_bool _workStarted{ false };
    std::atomic_bool _testEnded{ false };

    // Start a long-running task in arena 1
    arenas[1].enqueue( [&]() {
        _workStarted = true;
        while( !_testEnded ) {}
    } );

    // Wait for the long-running task to start
    while (!_workStarted)

    // Execute a parallel sort in each arena in turn, counting the number of distinct threads which participate
    for( uint32_t i = 0; i < arenas.size(); ++i )
        std::atomic_size_t numUsedThreads = 0;

        arenas[i].execute( [&]() {        
            tbb::parallel_sort( source.begin(), source.end(), [&](int lhs, int rhs)
                static thread_local uint32_t iLastSeenArena = 99;

                // Is the current thread appearing in the current arena for the first time?
                if( i != iLastSeenArena )
                    iLastSeenArena = i;

                return lhs < rhs;
            } );
        } );
        printf( "Arena %d used %lld threads\n", i, numUsedThreads.load() );

    _testEnded = true;


On my system the output of the test is:

Arena 0 used 2 threads
Arena 1 used 2 threads
Arena 2 used 2 threads

I would expect:

Arena 0 used 3 threads
Arena 1 used 2 threads
Arena 2 used 3 threads

Can anyone comment on what's going on here?


0 Kudos
2 Replies

Thank you for the reproducer code. I can reproduce the behavior you are seeing, let me check with our development team.


Hi - has the development team got any thoughts about whether this is a bug, or a mistake on my part?