TBB 3.0 not using all available cores?

mgmt1969 · ‎05-25-2010

Hello,

I'm finding it strange that TBB 3.0 is only using three worker threads on my quadcore machine.

task_scheduler_init::default_num_threads() is returning 4, which I take to mean is one main thread plus three worker threads. My main thread, however, only sets things up and then goes into a wait state while the work is done on the remaining threads. There is no way I can change this, as my application is a C# GUI with its own event & delegate system and I cannot apply the GUI Thread design pattern (at least not easily, I think).

I tried calling task_scheduler_init(5) and indeed four worker threads are created but the fourth one is never used, being permanently stuck in thread_monitor::commit_wait.

If my memory serves me well I did not have the same problem with TBB 2.2 and the same application I have now, with all my four cores nicely maxed out.

Can anyone elucidate me on what might be happening here?

Thank you,
manuel

PS: I forgot to mention that I am also using a couple of std::threads for tasks that mostly sit waiting except for occasional flurries of activity. I wonder if the task scheduler is getting confused with this combination of worker threads and std::tasks.

ARCH_R_Intel · ‎05-25-2010

What OS are you on?

Using task_scheduler_init(5) should have worked to keep all four cores busy. I'll see if I can reproduce the problem. TBB 3.0 should have the same behavior as TBB 2.2 on this point, though the underlying implementation changed radically so its possible a change was introduced accidentally.

mgmt1969 · ‎05-25-2010

I'm on Windows Vista 64bit, although I'm compiling a 32bit application.

ARCH_R_Intel · ‎05-25-2010

Did you build the "irml". E.g., "irml.dll" or "irml_debug.dll". There's a known bug that if either of those files are in your path, you will never get more than P-1 threads. The reason is that those files are "Resource Managers" that dutifully prevent oversubscription, even if you ask for it (that's the known bug). If those files are not in your path, asking for P+1 threads should work.

It worked for me on XP and Vista. Below is the program that I used to check. It will hang if the TBB scheduler does not deliver at least n worker threads.

[cpp]#include "tbb/tbb.h"
#include 

using namespace tbb;

atomic barrier;

struct WaitFunctor {
    void operator()( blocked_range r ) const {
        --barrier;
        // Wait until all threads reach the barrier.
        // Using a barrier like this in TBB is very bad style, because code should not
        // depend upon the TBB scheduler delivering a specific number of threads.
        while( barrier!=0 ) continue;
    }
};

int main() {
    int n = task_scheduler_init::default_num_threads();
    std::printf("n=%dn",n);
    task_scheduler_init( n+1 );
    barrier = n;
    parallel_for( blocked_range(0,n), WaitFunctor(), simple_partitioner() );
    std::printf("donen");
    return 0;
}
[/cpp]

mgmt1969 · ‎05-25-2010

No, I'm not compiling irml.
Thank you for your effort. It must be something that I am doing wrong.
I'll keep looking into it and I'll come back to the forum if I find something that is worth reporting.

Cheers,
manuel

mgmt1969 · ‎05-25-2010

Hello again,

Could you please try the following code:

#include
#include
#include
#include

#include

using namespace std;
using namespace tbb;

int my_barrier;
mutex my_mutex;
condition_variable my_cond;

struct CountTask : public task
{
virtual task* execute()
{
{
lock_guard lock(my_mutex);
printf("Got into thread %d\n", my_barrier--);
my_cond.notify_one();
}
// Causes all available threads to fill up by spinning
while (true);
return 0;
}
};

int main() {
int n = task_scheduler_init::default_num_threads();
printf("n=%d\n",n);
task_scheduler_init( n+1 );
my_barrier = n;
for (int i = 0; i < n; ++i)
task::enqueue(*new (task::allocate_root()) CountTask);
{
unique_lock lock(my_mutex);
while (my_barrier > 0)
my_cond.wait(lock);
}
printf("done\n");
return 0;
}

I tried this on an Intel i7 quadcore with 8 logical cores due to HT. default_num_threads() correctly returns 8. I then try to create 9 threads on the scheduler. The code above should work with the CountTask tasks running on the eight worker threads and the main() routine running on the main thread. The output however is:

n=8
Got into thread 8
Got into thread 7
Got into thread 6
Got into thread 5
Got into thread 4
Got into thread 3
Got into thread 2

So, TBB is only allocating 8 threads total (I also confirmed this with a debugger) and is then left hanged because the eight CountTask has no thread left to run on.

Cheers,
manuel

Alexey-Kukanov · ‎05-26-2010

This line:
task_scheduler_init( n+1 );
causes creation of a temporary instance of class task_scheduler_init, followed by its immediate destruction.

What you need is:
task_scheduler_init tbbinit( n+1 );

ARCH_R_Intel · ‎05-26-2010

Thanks for the example. I can replicate your result and will look into what went wrong.

ARCH_R_Intel · ‎05-26-2010

I posted my previous reply before seeing Alexey's remark. His observation is correct.

I occasionally make the mistake myself with RAII objects.

mgmt1969 · ‎05-26-2010

You are right about the tbbinit( n+1 ) statement, as Arch also confirmed.

I think I finally nailed down the problem that I'm having. The following code example should definitely show this:

#include
#include

#include
#include
#include

#include

using namespace std;
using namespace tbb;

int my_barrier;
mutex my_mutex;
condition_variable my_cond;

struct CountTask : public task
{
virtual task* execute()
{
{
lock_guard lock(my_mutex);
printf("Got into thread %d\n", my_barrier--);
my_cond.notify_one();
}
// Causes all available threads to fill up by spinning
while (true);
return 0;
}
};

static void MyFunc(int n)
{
// Everything works if I uncomment the next line
// task_scheduler_init tbbinit( n+1 );
my_barrier = n;
for (int i = 0; i < n; ++i)
task::enqueue(*new (task::allocate_root()) CountTask);
{
unique_lock lock(my_mutex);
while (my_barrier > 0)
my_cond.wait(lock);
}
}

int main() {
int n = task_scheduler_init::default_num_threads();
printf("n=%d\n",n);
task_scheduler_init tbbinit( n+1 );
std::thread thread(MyFunc, n);
thread.join();
printf("done\n");
return 0;
}

I understand now that every std::thread will have its own task scheduler. I was setting n+1 tasks on the main thread but not on the background std::thread. This was not clear to me before (the Reference manual does not mention this in Chapter 13 on Threads).

Cheers,
manuel

ARCH_R_Intel · ‎05-26-2010

That explains why you did not see the problem with TBB 2.2. With TBB 2.2, the first thread to initialize the task scheduler determined the number of worker threads for all user-created threads. With TBB 3.0, each user thread can specify the value separately.

I've made a note to myself to clarify this point in the Reference.