Cores are "used up"

uj · ‎09-02-2009

I've used the Advanced Task Programming example from the TBB book to create tasks that run in parallel along the main application task. I have two such tasks (one manages a Direct3D window and the other performs heavy duty calculations). The main application task (in principle a win32 GUI) controls the parallel tasks by sending commands via a producer-consumer queue (one tbb::concurrent_queue perparallel task). When a parallel task isn't doing any work it sleeps (it waits on a tbb::mutex).

The problem I'm having is that each parallel task seems to "consume" one processor core. I have 4 cores and only 2 are available to do actual work in the main application task. I noticed this by first scanning a std::vector serially and thendoing the exact same scan in parallelusing a tbb::parallel_for. The scan in parallel is just twice as fast as the serial one. It should be4 times as fast shouldn't it? That's why I suspect the two parallel tasks are "holding on" to one core each leaving only 2 to do work in the main application task.

To test theabove assumption I switchedoff one of the parallel tasks (the Direct3D window). And now the parallel scan is 3 times faster than the serial. I then introduced a dummy parallel task so there's a total of 3 parallel tasks. Now the serial and the parallell scanning tasks are equally fast. With two dummy parallel tasks, that is a total of 4 parallel tasks, the application just hangs. There's no assertion or anything (from MS Visual Studio in debug mode).I can note that it's the last parallel task started that fails (because the theDirect3D windownevershows up as it should if the parallel task associated with it would work).

This behaviormakes me believe each parallel task is holding on to a core. According to the TBB book this isn't what should happen, is it? I've gotthe impressionthat all 4 coreswouldbe available in every task.

This is my version of theexample from the TBB book,

[cpp]#ifndef MTHREAD2_MINTONCE
#define MTHREAD2_MINTONCE

#include "tbb/task.h"

namespace mthreads {

	class MThread2 { // to be privately inherited
	protected:
		~MThread2() {}
		MThread2() : process(new(tbb::task::allocate_root()) tbb::empty_task) {}

		virtual void run() =0; // to be overridden. runs in separate thread

		void start() {
			process->set_ref_count(2);
			tbb::task* s(new(process->allocate_child()) Task(this));
			process->spawn(*s);
		}

		void stop(){
			process->wait_for_all();
			process->destroy(*process);
		}

	private:
		MThread2(const MThread2&);
		MThread2& operator=(const MThread2&);

		class Task : public tbb::task {
		public:
			~Task(){}
			explicit Task(MThread2* e) : enclosure(e) {}
		private:
			Task(); 
			Task(const Task&);
			Task& operator=(const Task&);

			tbb::task* execute() { 
				enclosure->run();
				return 0;
			}

			MThread2* const enclosure;
		};

		tbb::empty_task* const process;

	};

}
#endif
[/cpp]

RafSchietekat · ‎09-02-2009

From a quick read of the description (I skipped the code), you want to simply substitute tasks for threads, but this is destined to fail. The scheduler is unaware that those tasks are waiting: it assumes that they are making full use of their assigned hardware thread (by way of their assigned worker thread), and should therefore be left alone. Perhaps if you review the tutorial material with this in mind you might come up with a more suitable design.

uj · ‎09-02-2009

Quoting - Raf Schietekat

From a quick read of the description (I skipped the code), you want to simply substitute tasks for threads, but this is destined to fail. The scheduler is unaware that those tasks are waiting: it assumes that they are making full use of their assigned hardware thread (by way of their assigned worker thread), and should therefore be left alone. Perhaps if you review the tutorial material with this in mind you might come up with a more suitable design.

I haven't invented this design. It's described in the TBB book (in the Advanced Task Programming example I mentionend). It states this:

"One of the bauitiful things about this approach is that each half of the programis free to invoke as much parallelism as it desires. The task-based approach of Threading Building Blocks does the load balancing and manages the assignment of tasks to threads without causing oversubscription"

I'm starting the secondary parallel tasks exactly as decribed in this example and according to the statement aboveTBB should then be aware of all tasks and be able to balance thecores between them. Soon one hand I have this example in the TBB bookexplaining how to "start a large task in parallel with the main program", and on the other hand I have you saying that it's destined to fail.

I don't knowhow to proceed really. It would be good to have it established whetherthe example is wrong or promises too much or if I've misunderstood it somehow or stretched it too far or made a programming error.

If I indeed have to drop this approach do you have a suggestion in which direction to go?

RafSchietekat · ‎09-03-2009

I don't have the book with me (I'll try to have a look before tomorrow), but I don't remember any example involving long-lived tasks that were waiting for something to happen much of the time (and I thingk I would...). In TBB, you continuously create tasks that are neither too small (because they always carry some overhead, however small compared to user threads) nor too large (because otherwise you might be missing an opportunity for parallelisation). Splitting work is great: maybe some idle worker may pick up a piece ("steal" it). Maybe it just so happens that you cannot split up part of the work, and then a big task can be tolerated because it is carrying its part of the load, but it should always be doing something, except for the occasional predictably short pause, because that's part of your contract with the scheduler.

You could proceed by arranging for a user thread to do the waiting, e.g., through tbb_thread, and when work is detected the thread can invoke TBB algorithms (or tasks directly) to process the work. But I'll hold off more specific suggestions until I've compared this with the book. One thing to look out for is that concurrency should be optional: you wouldn't want things to stop working if no additional worker threads are available.

uj · ‎09-03-2009

Quoting - Raf Schietekat

I don't have the book with me (I'll try to have a look before tomorrow), but I don't remember any example involving long-lived tasks that were waiting for something to happen much of the time (and I thing I would...). In TBB, you continuously create tasks that are neither too small (because they always carry some overhead, however small compared to user threads) nor too large (because otherwise you might be missing an opportunity for parallelisation). Splitting work is great: maybe some idle worker may pick up a piece ("steal" it). Maybe it just so happens that you cannot split up part of the work, and then a big task can be tolerated because it is carrying its part of the load, but it should always be doing something, except for the occasional predictably short pause, because that's part of your contract with the scheduler.

You could proceed by arranging for a user thread to do the waiting, e.g., through tbb_thread, and when work is detected the thread can invoke TBB algorithms (or tasks directly) to process the work. But I'll hold off more specific suggestions until I've compared this with the book. One thing to look out for is that concurrency should be optional: you wouldn't want things to stop working if no additional worker threads are available.

I really appreciate if you would look into this.

I realize that TBB is best for fine-grain multitasking and to have long-lived thread-like taskskind of goes against the intended usage of TBB. It's only because I found this example in the TBB book that I tried it. I got the impression that this approachwas preferredand that it would make it easier for TBB to make sure thecores got utilized optimally over the whole of the application (in my case between the main application GUI thread and the two "worker" threads).

uj · ‎09-03-2009

Because it may bepart of the problemI also post here my producer-consumer queue (PCQueue). The main task uses one such queue to communicate with each of the parallel tasks.

Initially the producer acquires a lock on the queue. When the producerpushes an item it checks whether there's a sleeping consumer waiting. If so it releases the lock. This wakes upthe consumer which popsthe item. Then theproducer re-aquires the lock again.

[cpp]#ifndef PCQUEUE_MINTONCE
#define PCQUEUE_MINTONCE

#include "tbb/mutex.h"
#include "tbb/queuing_mutex.h"
#include "tbb/tbb_thread.h"
#include "tbb/atomic.h"
#include "tbb/concurrent_queue.h"

namespace mthreads {

	template 
	class PCQueue { // producer-consumer queue (consumer thread sleeps while waiting)
	public:
		~PCQueue() {} 
		PCQueue() {produceLock.acquire(mutex);} // producer initially aquires lock
		void push(const T& item); // produce item
		void push(); // just wakeup consumer
		bool pop(T& item); // consume item
		bool empty() {return queue.empty();} // queue empty?

	private:
		PCQueue(const PCQueue&);
		PCQueue& operator=(const PCQueue&);

		typedef tbb::mutex Mutex;
		Mutex mutex; // threads sleep until lock granted
		Mutex::scoped_lock produceLock, consumeLock;
		tbb::concurrent_queue queue;
		tbb::atomic sleeping; // initializes to false
	};
	
	template  
	void PCQueue::push(const T& item) { // producer
		queue.push(item);
		if (sleeping) { // consumer waiting?
			produceLock.release(); // release lock to let waiting consumer in
			while (sleeping) tbb::this_tbb_thread::yield(); // wait for consumer to wake up
			produceLock.acquire(mutex); // relock
		}
	}

	template  
	bool PCQueue::pop(T& item) { // consumer
		sleeping = true;
		if (!queue.try_pop(item)) {
			consumeLock.acquire(mutex);   // sleep until producer releases lock
			sleeping = false;
			consumeLock.release();
			if (!queue.try_pop(item)) return false;
		} 
		sleeping = false;
		return true; // returns true if item was popped
	}

	template  
	void PCQueue::push() { // produce nothing, just wake the consumer
		if (sleeping) { // consumer waiting?
			produceLock.release(); // release lock to let waiting consumer in
			while (sleeping) tbb::this_tbb_thread::yield(); // wait for consumer to wake up
			produceLock.acquire(mutex); // relock
		}
	}

}
#endif
[/cpp]

RafSchietekat · ‎09-03-2009

Hmm... have you tried using concurrent_queue without that wrapper?

Alexey-Kukanov · ‎09-03-2009

Here area few thoughts/comments I had while reading the conversation.

The advanced technique described in the book was required to express the functional parallelism ("do few different jobs concurrently") pattern with TBB 1.x and 2.0. Later TBB 2.1 added a thread class that can also be used for this purpose, and TBB 2.2 added two other interfaces, the parallel_invoke function and the task_group class. Which of the three ways is best depends on the purpose, but the technique desribed in the book is clearly obsolete.

> This behavior makes me believe each parallel task is holding on to a core. According to the TBB book this isn't what should happen, is it? I've got the impression that all 4 cores would be available in every task.

No, this is not what should be expected. Once a task is started, it holds on to a thread, and by itself it can not use other threads. The only way to share the work with other threads is to spawn other tasks from inside the running one.

Your implementation of MThread class is fine, and if enclosure->run() used TBB inside it to express its own parallelism, this work would be shared with other threads. But it looks like the functions you try to run in parallel are in fact thread-like, i.e. long running and do not share their work. In this case, a separate thread (tbb_thread) is best choice, while parallel_invoke, task_group, or the technique from the book will all cause one TBB worker thread being employed to execute this function and lost for other work.

> "One of the bauitiful things about this approach is that each half of the program is free to invoke as much parallelism as it desires. The task-based approach of Threading Building Blocks does the load balancing and manages the assignment of tasks to threads without causing oversubscription"

I understand how this statement led you to the expectations you had. "As much parallelism as it desires" sounds to cover no nested parallelism at all. Well, this is somewhat an overstatement. On the other hand, the statement holds true in some sense, as it does not cause oversubscription in your case, "just" undersubscription :)

> I ... create tasks that run in parallel along the main application task. I have two such tasks (one manages a Direct3D window and the other performs heavy duty calculations). The main application task (in principle a win32 GUI) controls the parallel tasks by sending commands via a producer-consumer queue (one tbb::concurrent_queue per parallel task). When a parallel task isn't doing any work it sleeps (it waits on a tbb::mutex).

Let me share my understanding; and correct me if it's wrong.
So you have three long-running functions in the app - user interaction, managing Direct3D window, and doing heavy calculations. The first two, probably, don't consume CPU most of the time. The third can consume lots of CPU power, and possibly has quiet periods as well. Basically, first of all you want to use all CPUs for the heavy calculations.
One possibility to do that is to start thelong-running functions in their threads, and use TBB tasks (or parallel algorithms) only inside the heavy loaded one. In this case, you may have temporary oversubscription during the periods when GUI or D3D processing overlaps with calculations.This degree of oversubscriptionmight be not that bad in practice. If you think or know that (and why) such a setup will not work well, some adjustments can possibly be made to address specific issues.

uj · ‎09-03-2009

Quoting - Raf Schietekat

Hmm... have you tried using concurrent_queue without that wrapper?

I've tried that now. I've replaced the push method with,

queue.push(item);

and the pop method with,

while (!queue.try_pop(item)) {
tbb::this_tbb_thread::yield();
}

Unfortunately the problem remains as before (each parallel task"removes" performance corresponding to one core from the main task).

So the PCQueue wrapper doesn't seem to cause the problem. MaybeI should say that the reason I use it is that I want the consumer threads to be sleeping while waiting instead ofspinning.

RafSchietekat · ‎09-03-2009

"So the PCQueue wrapper doesn't seem to cause the problem. Maybe I should say that the reason I use it is that I want the consumer threads to be sleeping while waiting instead of spinning."
The wrapper would not be responsible for that issue, but it doesn't provide any benefit, adds some fixed overhead even with few cores, andthe use of a mutexmay also undo some or all of the scalability that has presumably been crafted into concurrent_queue. I think that you can get everything you need from concurrent_queue alone by just using push() and pop().

uj · ‎09-03-2009

Quoting - Raf Schietekat

"So the PCQueue wrapper doesn't seem to cause the problem. Maybe I should say that the reason I use it is that I want the consumer threads to be sleeping while waiting instead of spinning."
The wrapper would not be responsible for that issue, but it doesn't provide any benefit, adds some fixed overhead even with few cores, andthe use of a mutexmay also undo some or all of the scalability that has presumably been crafted into concurrent_queue. I think that you can get everything you need from concurrent_queue alone by just using push() and pop().

Note that the PCQueue wrapper has two operating modes. It sleeps duringperiods of inactivity, and then it acts like a pure concurrent_queue duringbursts of activity.

If this behaviour can be achievedusing aconcurrent_queue alone I woulddo that of course. Maybeyou could suggest something?

uj · ‎09-03-2009

> Quoting - Alexey Kukanow
>
> Here are a few thoughts/comments I had while reading the conversation.

Thank you.

Well, if the example in the TBB book is considered obsolete I'll drop if of course. On the other hand it works very well, apart from the problem I've reported in this thread, so I wouldn't mind keeping the solution I have.

Is this a bug really or is it something that can never be fixed due to the way TBB works? I'm just curious. Anyway I'llseek an alternate solution looking into the new TBBtechniques you mention.

I think I stickto the overall structure for my program; A main thread consisting of the Win32 GUI and then two heavy duty worker threads, oneresponsible for a Direct3D window and one for the number crunching.I'm working on a molecular dynamics application. The number cruncher constantly calculates new positions for molecules shown in the Direct3D window and the user can rotate and translate the molecules during the same time so everything is heavy duty really.

I know this division of an application along functional linesisn't the TBB recommended approach but I likethe subdivision into modules with thin communication channels between them. Otherwise everything gets so intertwined.

uj · ‎09-03-2009

I've submitted this problem as a bug. It has id 157.

I feel that even if the Advanced Task Programming example in the TBB book can be considered obsolete in the sense that there are newer alternate techniques that can replace it, it still uses standard TBB code and it doesn't work as intended. Thismakes it a bug in my view. And it may very well be that the problem I've reported is caused by some deeper anomality in TBB. At least it should be accounted for.

Even if the problem isn't considered a bugI hope this will lead to an official work-around. I liked theidea of usinglong-lived TBB tasks instead of ordinary threads, at least if that would help TBB to optimize the over-all core utilization over the whole of the program.

RafSchietekat · ‎09-04-2009

I'm afraid you have misinterpreted the example in the book, which has no long-lived tasks. It is obsolete only in the sense that there are now different ways to do the same thing. Tasks do something specific and then finish. They are not cheap threads, rather they are dispatched to threads. And that's a powerful idea, if used correctly (see above for some suggestions), so nothing needs changing based on this discussion.

For the concurrent_queue, have a look over at "Questions on TBB Concurrent Queue Implementation", about its use of condition variables to avoid busy waiting. TBB's scheduler still thinks the task is busy, though (a different meaning of busy), and will not dispatch other tasks to the same worker thread, so long waits should generally only occur in user threads, not in tasks.

Alexey-Kukanov · ‎09-04-2009

Quoting - uj

I've submitted this problem as a bug. It has id 157.

I feel that even if the Advanced Task Programming example in the TBB book can be considered obsolete in the sense that there are newer alternate techniques that can replace it, it still uses standard TBB code and it doesn't work as intended. Thismakes it a bug in my view. And it may very well be that the problem I've reported is caused by some deeper anomality in TBB. At least it should be accounted for.

Even if the problem isn't considered a bugI hope this will lead to an official work-around. I liked theidea of usinglong-lived TBB tasks instead of ordinary threads, at least if that would help TBB to optimize the over-all core utilization over the whole of the program.

This behavior is not a bug, but rather a design decision. I understand your code does not work as you expected it to, but it does work as intended. TBB tasks do not preempt each other, so a started task occupies a thread till its completion. Core utilization and scalability is achieved by having lots of lightweight tasks, rather than a few long-running thread-like ones.

So far, an official workaround is to use tbb_thread for such long-running non-cooperative job. Using tasks for that will not help the TBB scheduler to optimize core utilization, simply because these tasks do not interact with the scheduler during execution. If some of these long-running jobs also consume a lot of CPU, you may consider decreasing the number of TBB worker threads; e.g. in your case if the main GUI thread is mostly idle but the D3D thread is mostly busy, you may decrease the number of TBB workers by one, to accommodate for the busy thread.

We continue looking for a better solution for this kind of design challenges.

uj · ‎09-04-2009

Quoting - Raf Schietekat

I'm afraid you have misinterpreted the example in the book, which has no long-lived tasks. It is obsolete only in the sense that there are now different ways to do the same thing. Tasks do something specific and then finish. They are not cheap threads, rather they are dispatched to threads. And that's a powerful idea, if used correctly (see above for some suggestions), so nothing needs changing based on this discussion.

For the concurrent_queue, have a look over at "Questions on TBB Concurrent Queue Implementation", about its use of condition variables to avoid busy waiting. TBB's scheduler still thinks the task is busy, though (a different meaning of busy), and will not dispatch other tasks to the same worker thread, so long waits should generally only occur in user threads, not in tasks.

I don't think I have misinterpreted the example. From what I can seethere's no limitation on how long a parallel task can run. It could run forever without problems couldn't it? The problem seems to be the way I communicate between that main task and theparallel task. For some reasonsetting up a standard pipe usingaconcurrent_queue and a mutexdoesn'tfit the bill.

I still think spawning off worker tasks the way the example suggests is a good idea. It should help TBB balance the load better than if ordinary threads were used, shouldn'tit? In that caseI have two options I guess. Either I keepthe "eternal" worker tasks but find a better way to communicate with them. Or Ispawn offa series of worker tasks that each performs a limited amount of work and then finishes. That's probably more in line with the example.

Or I give up the "tasks" approach altogether and use ordinary threads.

robert-reed · ‎09-04-2009

Quoting - uj

I don't think I have misinterpreted the example. From what I can seethere's no limitation on how long a parallel task can run. It could run forever without problems couldn't it? The problem seems to be the way I communicate between that main task and theparallel task. For some reasonsetting up a standard pipe usingaconcurrent_queue and a mutexdoesn'tfit the bill.

I still think spawning off worker tasks the way the example suggests is a good idea. It should help TBB balance the load better than if ordinary threads were used, shouldn'tit? In that caseI have two options I guess. Either I keepthe "eternal" worker tasks but find a better way to communicate with them. Or Ispawn offa series of worker tasks that each performs a limited amount of work and then finishes. That's probably more in line with the example.

Or I give up the "tasks" approach altogether and use ordinary threads.

The basic philosophy of tasks in TBB is that they are quanta or quantifiable and can be managed and scheduled with lower overhead by making use of a pool of reusable threads. If one task is going to just take a thread and never return it to the pool, it doesn't really play in the pool and Alexey's suggestion of a tbb_task makes sense as well as paying attention to oversubscription to the available HW threads. On the other hand, for your molecular modeling component, you'll want to have that adapt to however many HW threads are available in the likely increasing future.

What is the nature of the molecular modeling work? Are you folding molecules and dealing with collision detection, or dealing with forces or that sort of thing? Are you doing a time-step dynamic animation? I presume the graphics subsystem can take 3D models and do all the world-space viewing work. Does the GPU do more than that? How can you structure the modeling work as a hierarchy of tasks you can schedule with the thread pool?

I've been working on some sample code that uses code very similar to what you quote from James's book, except in my case it's buried inside the implementation of parallel_invoke, the two-task version, which I use in a recursive parallelism example to manage the n² interactions between bodies in a gravitational simulation. This is more the intention for the dual spawn and wait for all pattern contained in the original code. My sample doesn't have either the interactive front end or the graphics pipeline to deal with--those sound like work queues better suited to other parallel patterns, event and pipeline specificially--but it does deal with repeated operations over a simulation time step.

RafSchietekat · ‎09-05-2009

"I don't think I have misinterpreted the example. From what I can see there's no limitation on how long a parallel task can run. It could run forever without problems couldn't it?"
TBB can't protect itself against such abuse. There is no legal enforcement of your contract with the scheduler. But why would you try to trick it if its sole purpose in life is to serve you?

In the example, some limited amount of work is presented to the scheduler for execution at a convenient time (the title is "Start a Large Task...", not "Start an Unlimited Task..."). With a single hardware thread, that time is no earlier than the call to WaitForSideShow(), by default! I'll repeat what I wrote earlier: tasks are not cheap threads. In particular, they should not be used for required concurrency (even if they thrive on optional parallelism). I guess they take a bit of getting used to. Once you know all the rules, you might decide to bend them, at your own responsibility, but no earlier.

"The problem seems to be the way I communicate between that main task and the parallel task. For some reason setting up a standard pipe using a concurrent_queue and a mutex doesn't fit the bill."
Have you actually observed any significant amount of wasted cycles using just push() and pop() on a concurrent_queue all by itself?

"I still think spawning off worker tasks the way the example suggests is a good idea. It should help TBB balance the load better than if ordinary threads were used, shouldn't it? In that case I have two options I guess. Either I keep the "eternal" worker tasks but find a better way to communicate with them."
Bad idea: there is no such thing as a worker task, because tasks are not cheap threads (have I said this before?). TBB couldn't balance a mouse mat if you break your contract, and you won't be able to communicate your way out of that.

"Or I spawn off a series of worker tasks that each performs a limited amount of work and then finishes. That's probably more in line with the example."
Now you're talking!

"Or I give up the "tasks" approach altogether and use ordinary threads."
Each have their role to play, see the suggestions above.

uj · ‎09-05-2009

Quoting - Raf Schietekat

>> "I don't think I have misinterpreted the example. From what I can see there's no limitation on how
>>long a parallel task can run. It could run forever without problems couldn't it?"

> TBB can't protect itself against such abuse. There is no legal enforcement of your contract with the
> scheduler. Butwhy would you try to trick it if its sole purpose in life is to serve you?

Well, I guess we'll have to let Mr. Reinders himself settle this dispute. :)

But until he does I think you're wrong. The running length of the parallel tasks started according to the technique described in the example is irrelevant. Let me convince you. When would you say the running length of such a task becomes a problem? After a nanosecond maybe, or a microsecond, or a second, an hourora even year? Well, you cannot say for the simple reason that it never becomes a problem. The running length just isn't an issue.

In my view the purpose of the example is to show how you can use TBB tasks in a thread-like manner instead of resorting to "real"threads. The advantagebeing that this would make it easier for TBB to balancethe cores throughout. The disadvantage is that such TBB task-threads have limitations compared with "real" threads. The example doesn't mention this and I had to learnit thehard way.

Anyway I've made a few changes to my program and it now works at the expected performance level (a parallel scan of a vector is 4 times faster than a serial scan on my quad core machine). I'm still using the TBB task-threads described in the example as worker threads. It's just thatthey no longerwait on a mutex when nothing happens. Instead they're created anew each timethere's something for them to do. Theyread items from a concurrent_queue as long as there are any. When there's nothing more to dofor the moment they lay down and die (only to rise again from the ashes likethe mythical Phoenix whenmore work arrives :). I only had to change a few lines of code actually.

Despite the fact that we disagree on some issues I would like to thank you for the insight you've provided. I would also like to thank everybody else who have contributed to this thread. Thank you all !!!

RafSchietekat · ‎09-05-2009

"When would you say the running length of such a task becomes a problem?"
See #3, first paragraph.

"Anyway I've made a few changes to my program and it now works at the expected performance level (a parallel scan of a vector is 4 times faster than a serial scan on my quad core machine). I'm still using the TBB task-threads described in the example as worker threads. It's just that they no longer wait on a mutex when nothing happens. Instead they're created anew each time there's something for them to do. They read items from a concurrent_queue as long as there are any. When there's nothing more to do for the moment they lay down and die (only to rise again from the ashes like the mythical Phoenix when more work arrives :). I only had to change a few lines of code actually."
That's exactly it: tasks do something specific, and then finish. Of course it is beneficial to also think of optimisations like recycling them after they finish, and/or, as in this case, letting them do more before they finish. Just realise that requiring concurrency may happen to work here, but it can very easily break down. In general, it is better to write programs that do not need additional worker threads, for easier debugging and for more general applicability (I'm not sure, but maybe requiring concurrency from different places can add up to exhaust any finite number of worker threads, even if you decide to require a multicore system?), and to use user threads instead when concurrent operation is needed (provided through time slicing if not through parallelism). Maybe TBB could add support for executing code on a thread of its own (from a separate pool), to take away the temptation to use tasks for this purpose...

(Added 2009-09-06) Actually, I have no clear view yet on what a separate thread pool would do, and in this case it would make no difference because the thread would be long-lived, so please ignore that last sentence for now.

uj · ‎09-06-2009

Quoting - Robert Reed

> The basic philosophy of tasks in TBB is that they are quanta or quantifiable and can be managed and scheduled with lower overhead by making use of a pool of reusable threads. If one task is going to just take a thread and never return it to the pool, it doesn't really play in the pool and Alexey's suggestion of a tbb_task makes sense as well as paying attention to oversubscription to the available HW threads.

Well, what you sayis not supported by the Advanced Task Programming example. It states:

"Instead of having all threads execute portions of a problem, it is possible to start a task in parallel with the main application."

What's more, it then goes on to state that this usage is advantageous:

"One of the buitiful things about this approach is that each half of the program is free to invoke as much parallellism as it desires. The task-based approach of TBB does the load balancing and manages the assignment of tasks without causing oversubscription"

Until anyone iscoming up with firm evidence to the contrary I'm going to believe the example is showing good TBB usage.

I believe thatthe running length of a TBB task is irrelevant basically.A task running a picosecond or a millisecond isn't any better than atask running a minute or a day. On the contrary.Longlived tasks have certain advantages over shortlived tasks. One is that the overhead of creating and destroyingthe task becomes a smaller portionof its lifetime. A longlived task simplyspends more time doing useful work. Another is thatthe TBB scheduler is more likelyto reach a steady state of optimal performancewhen tasks are longlived rather thanshortlivedand in turbulentflux.

What's of paramount importance though is that the TBB tasks aren't used in a way the gets in the way of the TBB scheduler. That was my problem, not that myworker tasks werelonglived.

So TBB is perfectly capable of handling tasks of any length. In fact it's likely to do a better job if longlived TBB tasks are used in place of"normal" OS threads because then it has more control. This is my firm belief after having read the Advanced Task Programming example. I'm happy to be proved wrong but then it should be with technical evidence not byrepeatingamantra (that TBB requires shortlived tasksto work well).