TBB + single core + GPU = ?

calvin1602 · ‎12-27-2008

Hi,

I've got two question which are more or less related to the use of a GPU.

Firstly : this is an excerpt from the TBB reference :

The task scheduler is intended for parallelizing computationally intensive work.
Because task objects are not scheduled preemptively, they should not make calls that
might block for long periods, because meanwhile that thread is precluded from
servicing other tasks.

Okay, but what do you call a "long period" ? Is submitting a blocking draw call to the GPU "long" ?

Second question ; second excerpt from the reference :

task_scheduler_init( int
number_of_threads=automatic, stack_size_type
thread_stack_size=0 )

okay, so if I understand correctly, there will be as much treads as cores. But this is _not_ effficient for me : as I just sait, I'm using the GPU. Currently, I've got a single core, and when my application runs, the GPU is running at 100% whereas my CPU is only at, say, 60% ( i've not multithreaded anything right now ).

If I decide to use TBB, will it notice, at runtime, that there is a bottleneck somewhere, and will it create other threads ?

A solution for me would be to manually detect the number of cores N, and initialize the scheduler with, say, N+1 or N+2. What do you think of that approach ?

Thanks

uj · ‎12-27-2008

Calvin1602: "A solution for me would be to manually detect the number of cores N, and initialize the scheduler with, say, N+1 or N+2. What do you think of that approach ?"

I don't think that strategy corresponds to the intended use of TBB because theidea is that you design your programin terms of logical tasks and then TBB manages those torun on the actual available cores. The logical tasks should reflect the parallelisms available in the problem domain of your application, not the computer hardware.

Also I think TBB favours algorithmic parallelisms rather than structural. Still it's possible to subdivide an application into several large parts running in parallell. I've done that. In my application I have one main task responsible for the GUI and the business logic. Then in addition I have several separate subtasks responsible for potentially blockingexercises. One subtaskis a number crunching engine whichmay be using external hardware.I also have subtasks each responsible for drawing in a Direct3D window so theycommunicates with a GPU. The main task and thesubtasks arerunning in parallel,all managed by the TBB scheduler. Then within this structurally motivateddivision of the application into parallel tasks Iuse TBB to do what it was designed for, namely to parallelize algorithms.

I'velooked atthe Advanced Task Programming example (p. 230) in the TBB book as a template for setting up the subtasks. I communicate with these subtasks using a TBB concurrent_queue but I've made sure the subtasks really sleep while idle so they're waiting on theOS-dependent mutex, not a user space spinning one. It works very smoothlybut I have a 4 core processor so I haven't really tested it on a single core system yet.

Inotice thaton the What's new in TBB 2.1 page an ISO C++ thread class is mentioned. I assume it can be used to accomplish what I've done above, maybe even in a simpler way. So why not have a look at that to manage blocking tasks.

calvin1602 · ‎12-29-2008

I don't think that strategy corresponds to the intended use of TBB

yeah, I think so too ... that's precisely my problem.

I haven't really tested it on a single core system yet.

Well, the number of cores is not really the point; It I make some Sleep( a few milliseconds ) in one thread, the current core will be idle, regardless whether you have one core or eight.

My question is : Will TBB notice that I make those Sleep() ? Or is there a way to tell him ? And, if there is not, is my ... "workaround" okay ?

I don't really fancy the idea of using threads manaully in that case. If I choose TBB, it is because it can handle this kind of situation ( more or less neatly, but hanlde it however )

I can rephrase my question : What is TBB's algorithm for choosing the threads number ?

Thanks,

Arnaud Masserann

calvin1602 · ‎12-29-2008

Okay, so it seems that I have my answer.

from the TBB 2.1 source code, task_scheduler_init.h :

//! Returns the number of threads tbb scheduler would create if initialized by default.
/** Result returned by this method does not depend on whether the scheduler
has already been initialized.

Because tbb 2.0 does not support blocking tasks yet, you may use this method
to boost the number of threads in the tbb's internal pool, if your tasks are
doing I/O operations. The optimal number of additional threads depends on how
much time your tasks spend in the blocked state. */
static int default_num_threads ();

So it seems that they already thougt about that :) I'll check how they do in Smoke.

RafSchietekat · ‎12-29-2008

TBB currently has no idea how busy its worker threads actually are (and I also wonder sometimes how well its approach holds up on a system where other programs are competing for the cores' attention at the same time). At first sight, dispatching an additional worker thread seems only useful if you know that it will be blocked most of the time, otherwise you'll have to deal with oversubscription issues, but others should know better what impact this actually has. Maybe you would find the thread "TBB Task Scheduler: Integrating Data Tasks for Data Parallelism (need help)" interesting.