Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2477 Discussions

Unexpected thread distribution when one of several arenas has a long-running task

oliver_critchley
Beginner
1,849 Views

Hi,

I'm seeing something unexpected in this scenario:

  • 8 threads are available in the market
  • I have 3 arenas each with 2 + 1 slots, i.e. 3 slots with one reserved for a master thread
  • one arena is busy with a single long-running task

In this scenario it seems that the number of available slots in each arena is reduced by 1. Specifically, if I use task_arena::execute() to schedule work into an idle arena only 2 threads take part (I guess the master thread and one TBB thread).

Below is a short test to reproduce the problem. My environment is:

  • 8-core PC
  • Windows 10
  • Visual Studio 2017
  • TBB 2020 update 3
#include <iostream>
#include <tbb/parallel_sort.h>
#include <tbb/task_arena.h>
#include <tbb/global_control.h>
#include <atomic>
#include <vector>
#include <numeric>

int main()
{
    // Initialise the market with 8 TBB threads.
    tbb::global_control globalControl(tbb::global_control::max_allowed_parallelism, 8);

    // Create three arenas, each with 2 slots for TBB threads and one slot for a master thread
    std::vector<tbb::task_arena> arenas = { {3, 1}, {3, 1}, {3, 1} };

    // Create some data to be sorted
    std::vector<int> source( 100'000 );
    std::iota( source.begin(), source.end(), 0 );

    std::atomic_bool _workStarted{ false };
    std::atomic_bool _testEnded{ false };

    // Start a long-running task in arena 1
    arenas[1].enqueue( [&]() {
        _workStarted = true;
        while( !_testEnded ) {}
    } );

    // Wait for the long-running task to start
    while (!_workStarted)
    {
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }

    // Execute a parallel sort in each arena in turn, counting the number of distinct threads which participate
    for( uint32_t i = 0; i < arenas.size(); ++i )
    {
        std::atomic_size_t numUsedThreads = 0;

        arenas[i].execute( [&]() {        
            tbb::parallel_sort( source.begin(), source.end(), [&](int lhs, int rhs)
            {
                static thread_local uint32_t iLastSeenArena = 99;

                // Is the current thread appearing in the current arena for the first time?
                if( i != iLastSeenArena )
                {
                    iLastSeenArena = i;
                    ++numUsedThreads;
                }

                return lhs < rhs;
            } );
        } );
        printf( "Arena %d used %lld threads\n", i, numUsedThreads.load() );
    }

    _testEnded = true;
}

 

On my system the output of the test is:

Arena 0 used 2 threads
Arena 1 used 2 threads
Arena 2 used 2 threads

I would expect:

Arena 0 used 3 threads
Arena 1 used 2 threads
Arena 2 used 3 threads

Can anyone comment on what's going on here?

Thanks!

0 Kudos
1 Solution
James_T_Intel
Moderator
1,276 Views

I apologize for the delayed update. Our developers have been investigating this issue and identified a fix. However, this fix has significant performance impacts, and will not be implemented until the performance issues have been addressed.


Please watch the Release Notes at https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-threading-building-blocks-release-notes.html for future updates to include this fix.


I am closing this case for Intel support. All future replies in this thread will be considered community only and not monitored by Intel support.


View solution in original post

0 Kudos
4 Replies
James_T_Intel
Moderator
1,819 Views

Thank you for the reproducer code. I can reproduce the behavior you are seeing, let me check with our development team.


0 Kudos
oliver_critchley
Beginner
1,741 Views

Hi - has the development team got any thoughts about whether this is a bug, or a mistake on my part?

0 Kudos
James_T_Intel
Moderator
1,277 Views

I apologize for the delayed update. Our developers have been investigating this issue and identified a fix. However, this fix has significant performance impacts, and will not be implemented until the performance issues have been addressed.


Please watch the Release Notes at https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-threading-building-blocks-release-notes.html for future updates to include this fix.


I am closing this case for Intel support. All future replies in this thread will be considered community only and not monitored by Intel support.


0 Kudos
oliver_critchley
Beginner
1,264 Views

Hey - thanks for the reply; I'd assumed the topic had been forgotten! I'll look out for the fix in the release notes.

0 Kudos
Reply