Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
2452 Discussions

Unexpected thread distribution when one of several arenas has a long-running task

oliver_critchley
Beginner
1,051 Views

Hi,

I'm seeing something unexpected in this scenario:

  • 8 threads are available in the market
  • I have 3 arenas each with 2 + 1 slots, i.e. 3 slots with one reserved for a master thread
  • one arena is busy with a single long-running task

In this scenario it seems that the number of available slots in each arena is reduced by 1. Specifically, if I use task_arena::execute() to schedule work into an idle arena only 2 threads take part (I guess the master thread and one TBB thread).

Below is a short test to reproduce the problem. My environment is:

  • 8-core PC
  • Windows 10
  • Visual Studio 2017
  • TBB 2020 update 3
#include <iostream>
#include <tbb/parallel_sort.h>
#include <tbb/task_arena.h>
#include <tbb/global_control.h>
#include <atomic>
#include <vector>
#include <numeric>

int main()
{
    // Initialise the market with 8 TBB threads.
    tbb::global_control globalControl(tbb::global_control::max_allowed_parallelism, 8);

    // Create three arenas, each with 2 slots for TBB threads and one slot for a master thread
    std::vector<tbb::task_arena> arenas = { {3, 1}, {3, 1}, {3, 1} };

    // Create some data to be sorted
    std::vector<int> source( 100'000 );
    std::iota( source.begin(), source.end(), 0 );

    std::atomic_bool _workStarted{ false };
    std::atomic_bool _testEnded{ false };

    // Start a long-running task in arena 1
    arenas[1].enqueue( [&]() {
        _workStarted = true;
        while( !_testEnded ) {}
    } );

    // Wait for the long-running task to start
    while (!_workStarted)
    {
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }

    // Execute a parallel sort in each arena in turn, counting the number of distinct threads which participate
    for( uint32_t i = 0; i < arenas.size(); ++i )
    {
        std::atomic_size_t numUsedThreads = 0;

        arenas[i].execute( [&]() {        
            tbb::parallel_sort( source.begin(), source.end(), [&](int lhs, int rhs)
            {
                static thread_local uint32_t iLastSeenArena = 99;

                // Is the current thread appearing in the current arena for the first time?
                if( i != iLastSeenArena )
                {
                    iLastSeenArena = i;
                    ++numUsedThreads;
                }

                return lhs < rhs;
            } );
        } );
        printf( "Arena %d used %lld threads\n", i, numUsedThreads.load() );
    }

    _testEnded = true;
}

 

On my system the output of the test is:

Arena 0 used 2 threads
Arena 1 used 2 threads
Arena 2 used 2 threads

I would expect:

Arena 0 used 3 threads
Arena 1 used 2 threads
Arena 2 used 3 threads

Can anyone comment on what's going on here?

Thanks!

0 Kudos
1 Solution
James_T_Intel
Moderator
478 Views

I apologize for the delayed update. Our developers have been investigating this issue and identified a fix. However, this fix has significant performance impacts, and will not be implemented until the performance issues have been addressed.


Please watch the Release Notes at https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-threading-building-blocks-release-notes.html for future updates to include this fix.


I am closing this case for Intel support. All future replies in this thread will be considered community only and not monitored by Intel support.


View solution in original post

4 Replies
James_T_Intel
Moderator
1,021 Views

Thank you for the reproducer code. I can reproduce the behavior you are seeing, let me check with our development team.


oliver_critchley
Beginner
943 Views

Hi - has the development team got any thoughts about whether this is a bug, or a mistake on my part?

James_T_Intel
Moderator
479 Views

I apologize for the delayed update. Our developers have been investigating this issue and identified a fix. However, this fix has significant performance impacts, and will not be implemented until the performance issues have been addressed.


Please watch the Release Notes at https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-threading-building-blocks-release-notes.html for future updates to include this fix.


I am closing this case for Intel support. All future replies in this thread will be considered community only and not monitored by Intel support.


oliver_critchley
Beginner
466 Views

Hey - thanks for the reply; I'd assumed the topic had been forgotten! I'll look out for the fix in the release notes.

Reply