Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
51 Views

Unexpected thread distribution when one of several arenas has a long-running task

Hi,

I'm seeing something unexpected in this scenario:

  • 8 threads are available in the market
  • I have 3 arenas each with 2 + 1 slots, i.e. 3 slots with one reserved for a master thread
  • one arena is busy with a single long-running task

In this scenario it seems that the number of available slots in each arena is reduced by 1. Specifically, if I use task_arena::execute() to schedule work into an idle arena only 2 threads take part (I guess the master thread and one TBB thread).

Below is a short test to reproduce the problem. My environment is:

  • 8-core PC
  • Windows 10
  • Visual Studio 2017
  • TBB 2020 update 3
#include <iostream>
#include <tbb/parallel_sort.h>
#include <tbb/task_arena.h>
#include <tbb/global_control.h>
#include <atomic>
#include <vector>
#include <numeric>

int main()
{
    // Initialise the market with 8 TBB threads.
    tbb::global_control globalControl(tbb::global_control::max_allowed_parallelism, 8);

    // Create three arenas, each with 2 slots for TBB threads and one slot for a master thread
    std::vector<tbb::task_arena> arenas = { {3, 1}, {3, 1}, {3, 1} };

    // Create some data to be sorted
    std::vector<int> source( 100'000 );
    std::iota( source.begin(), source.end(), 0 );

    std::atomic_bool _workStarted{ false };
    std::atomic_bool _testEnded{ false };

    // Start a long-running task in arena 1
    arenas[1].enqueue( [&]() {
        _workStarted = true;
        while( !_testEnded ) {}
    } );

    // Wait for the long-running task to start
    while (!_workStarted)
    {
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }

    // Execute a parallel sort in each arena in turn, counting the number of distinct threads which participate
    for( uint32_t i = 0; i < arenas.size(); ++i )
    {
        std::atomic_size_t numUsedThreads = 0;

        arenas[i].execute( [&]() {        
            tbb::parallel_sort( source.begin(), source.end(), [&](int lhs, int rhs)
            {
                static thread_local uint32_t iLastSeenArena = 99;

                // Is the current thread appearing in the current arena for the first time?
                if( i != iLastSeenArena )
                {
                    iLastSeenArena = i;
                    ++numUsedThreads;
                }

                return lhs < rhs;
            } );
        } );
        printf( "Arena %d used %lld threads\n", i, numUsedThreads.load() );
    }

    _testEnded = true;
}

 

On my system the output of the test is:

Arena 0 used 2 threads
Arena 1 used 2 threads
Arena 2 used 2 threads

I would expect:

Arena 0 used 3 threads
Arena 1 used 2 threads
Arena 2 used 3 threads

Can anyone comment on what's going on here?

Thanks!

0 Kudos
1 Reply
Highlighted
Moderator
21 Views

Thank you for the reproducer code. I can reproduce the behavior you are seeing, let me check with our development team.


0 Kudos