Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
2401 Discussions

Unexpected thread distribution when one of several arenas has a long-running task

oliver_critchley
Beginner
510 Views

Hi,

I'm seeing something unexpected in this scenario:

  • 8 threads are available in the market
  • I have 3 arenas each with 2 + 1 slots, i.e. 3 slots with one reserved for a master thread
  • one arena is busy with a single long-running task

In this scenario it seems that the number of available slots in each arena is reduced by 1. Specifically, if I use task_arena::execute() to schedule work into an idle arena only 2 threads take part (I guess the master thread and one TBB thread).

Below is a short test to reproduce the problem. My environment is:

  • 8-core PC
  • Windows 10
  • Visual Studio 2017
  • TBB 2020 update 3
#include <iostream>
#include <tbb/parallel_sort.h>
#include <tbb/task_arena.h>
#include <tbb/global_control.h>
#include <atomic>
#include <vector>
#include <numeric>

int main()
{
    // Initialise the market with 8 TBB threads.
    tbb::global_control globalControl(tbb::global_control::max_allowed_parallelism, 8);

    // Create three arenas, each with 2 slots for TBB threads and one slot for a master thread
    std::vector<tbb::task_arena> arenas = { {3, 1}, {3, 1}, {3, 1} };

    // Create some data to be sorted
    std::vector<int> source( 100'000 );
    std::iota( source.begin(), source.end(), 0 );

    std::atomic_bool _workStarted{ false };
    std::atomic_bool _testEnded{ false };

    // Start a long-running task in arena 1
    arenas[1].enqueue( [&]() {
        _workStarted = true;
        while( !_testEnded ) {}
    } );

    // Wait for the long-running task to start
    while (!_workStarted)
    {
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }

    // Execute a parallel sort in each arena in turn, counting the number of distinct threads which participate
    for( uint32_t i = 0; i < arenas.size(); ++i )
    {
        std::atomic_size_t numUsedThreads = 0;

        arenas[i].execute( [&]() {        
            tbb::parallel_sort( source.begin(), source.end(), [&](int lhs, int rhs)
            {
                static thread_local uint32_t iLastSeenArena = 99;

                // Is the current thread appearing in the current arena for the first time?
                if( i != iLastSeenArena )
                {
                    iLastSeenArena = i;
                    ++numUsedThreads;
                }

                return lhs < rhs;
            } );
        } );
        printf( "Arena %d used %lld threads\n", i, numUsedThreads.load() );
    }

    _testEnded = true;
}

 

On my system the output of the test is:

Arena 0 used 2 threads
Arena 1 used 2 threads
Arena 2 used 2 threads

I would expect:

Arena 0 used 3 threads
Arena 1 used 2 threads
Arena 2 used 3 threads

Can anyone comment on what's going on here?

Thanks!

0 Kudos
2 Replies
James_T_Intel
Moderator
480 Views

Thank you for the reproducer code. I can reproduce the behavior you are seeing, let me check with our development team.


oliver_critchley
Beginner
402 Views

Hi - has the development team got any thoughts about whether this is a bug, or a mistake on my part?

Reply