- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
MADadrobiso:TBB uses an approximation of Cilk-style scheduling (work LIFO; steal FIFO).It has good locality and load balancing properties while making certain space guarantees. It enables clean nesting of software components, analogous to the way serial code composes by nested subroutine calls.
FIFO has terrible locality and often has no space guarantees, which often leads to complex throttling mechanisms. Nonetheless FIFO scheduling definitely has its uses, notably for soft real-time. Finding a rationale marriage ofCilk-style and FIFO scheduling remains an open problem. If done rigorously it's a worthyfor aPh.D. thesis.
The "two roots" idea suggests that perhaps said marriage could be accomplished by some kind of epoch mechanism, where each task belongs to an epoch and older epochs are given precedence. There's a long way between a notion and a practical system.
There are two kinds of user algorithms.
First is a 'Cilk-style' computational algorithms ('HPC'). They doesn't require nor won't benefit from 'fair' processing. Even worser, they will degrade and will require more memory.
Second is a kind of general-purpose algorithms (games, client/server software). Sometimes they just require fair (at least to some degree) processing. Otherwise they can basically deadlock.
So, I think, the first decision one have to made is how to cope with those kinds of user algorithms. For example TBB can postulate that it doesn't support second kind at all. Or TBB can have 2 modes of operation, i.e. user will have to setup proper type of scheduler at the beginning of program. Or TBB can support 2 kind simultaneously, for example on per task basis, i.e. those tasks are HPC, and those are general-purpose. Or just provide some single compromise scheduler.
I'm curious what .NET TPL team will decide on this problem:
http://blogs.msdn.com/pfxteam/archive/2008/08/01/8800195.aspx
I hope they will publish some results. Btw, I see some interesting words in comments, like PRIO, SIZE and AGE.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is impossible for a purely LIFO scheduler to guarantee forward progress unassisted. Why not provide an integrated scheduler to more easily keep the worker threads busy while providing this guarantee?
Each FIFO batch would run in LIFO fashion to exploit locality.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is impossible for a purely LIFO scheduler to guarantee forward progress unassisted.
Why? Whole stack in the work-stealing deque will be eventually drained. This invietably means that all tasks will be processed. What am I missing here?
Each FIFO batch would run in LIFO fashion to exploit locality.
As I understand, locality is when you process task AND its CHILDS one after another, not when you process just some batch of tasks one after another.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"This invietably means that all tasks will be processed." Yes, but also that the LIFO scheduler isn't the whole story. How about a worthy superstructure? It takes some effort to provide efficient forward progress, not to mention handling priorities and timing constraints and whatnot. I would like a Building Block that handles that for me.
"As I understand, locality is when you process task AND its CHILDS one after another, not when you process just some batch of tasks one after another." Yes, "in LIFO fashion": while(true){ task::spawn_root_and_wait(new_batch); }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"This invietably means that all tasks will be processed." Yes, but also that the LIFO scheduler isn't the whole story. How about a worthy superstructure? It takes some effort to provide efficient forward progress, not to mention handling priorities and timing constraints and whatnot. I would like a Building Block that handles that for me.
"As I understand, locality is when you process task AND its CHILDS one after another, not when you process just some batch of tasks one after another." Yes, "in LIFO fashion": while(true){ task::spawn_root_and_wait(new_batch); }
I still can't catch your point. Can you briefly outline high-level pseudo-code for the scheduler, please?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A worker thread always first exhausts its LIFO tasks.
A very simplistic ultra-fair but not terribly efficient addition would be a global queue. Just before stealing from another worker thread, a thread tries to grab a task from the global queue instead. This avoids interfering with locality, thus possibly improving throughput compared to stealing first.
Maybe just before that, a thread could try to steal from any thread that is falling behind based on the last time it tried to steal or grab a new task.
Communication costs could be amortised by also using (fixed-size) task buckets. This may mean that a task that is added later could be processed earlier than some tasks in preceding buckets, but there is still global progress (no task left behind).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A worker thread always first exhausts its LIFO tasks.
A very simplistic ultra-fair but not terribly efficient addition would be a global queue. Just before stealing from another worker thread, a thread tries to grab a task from the global queue instead. This avoids interfering with locality, thus possibly improving throughput compared to stealing first.
Maybe just before that, a thread could try to steal from any thread that is falling behind based on the last time it tried to steal or grab a new task.
Communication costs could be amortised by also using (fixed-size) task buckets. This may mean that a task that is added later could be processed earlier than some tasks in preceding buckets, but there is still global progress (no task left behind).
Who and when adds tasks to global queue?
Btw, now this strongly recalls .NET Task Parallel Library scheduler:
- A new thread pool thread needs to allocate its work stealing queue.
- When queuing a new work item, we must check to see if were on a pool thread. If so, we will queue the work item into the work stealing queue instead of the global queue.
- When a pool thread looks for work, it needs to:
- First consult its local work stealing queue.
- If that fails, it then looks at the global queue.
- Lastly, if that fails, it needs to steal from other work stealing queues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Who and when adds tasks to global queue?" A task or a tbb_thread, basically when they detect work that is not a dependency of what they are currently doing (with some care, limited amounts of work can still be stacked locally, and a realistic solution would be more than just a global queue).
"Btw, now this strongly recalls .NET Task Parallel Library scheduler" I'm sure they also use two's complement numbers and other inescapable concepts. But I prefer being able to say I came up with them all by myself (also seems better for making a contribution), so I'll ignore this particular reference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Who and when adds tasks to global queue?" A task or a tbb_thread, basically when they detect work that is not a dependency of what they are currently doing (with some care, limited amounts of work can still be stacked locally, and a realistic solution would be more than just a global queue).
Now I am starting getting your point. I think this can make sense. For big "user-level tasks" (not TBB tasks) fairness is definitely good.
Though one must be very careful here, how are you going to detect whether task depends on what thread currently doing? Currently I don't see any ways...
"Btw, now this strongly recalls .NET Task Parallel Library scheduler" I'm sure they also use two's complement numbers and other inescapable concepts. But I prefer being able to say I came up with them all by myself (also seems better for making a contribution), so I'll ignore this particular reference.
Oh, I see... then to be honest you must not look into current TBB scheduler implementation too :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Now I am starting getting your point. I think this can make sense. For big "user-level tasks" (not TBB tasks) fairness is definitely good." Well, with appropriate scheduling available there's no reason to make that distinction anymore.
"Though one must be very careful here, how are you going to detect whether task depends on what thread currently doing? Currently I don't see any ways..." I meant the other way around: if you don't "need" to wait for the new task to complete, then maybe you shouldn't.
"Oh, I see... then to be honest you must not look into current TBB scheduler implementation too :)" Hmm, if I want to contribute to something, I had better know how it works now.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Now I am starting getting your point. I think this can make sense. For big "user-level tasks" (not TBB tasks) fairness is definitely good." Well, with appropriate scheduling available there's no reason to make that distinction anymore.
It seems that conversation start cycling :)
Total fairness (FIFO) is not compatible with locality. Absence of locality is not compatible with performance. Hence total fairness is not compatible with performance.
What I am missing here?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Though one must be very careful here, how are you going to detect whether task depends on what thread currently doing? Currently I don't see any ways..." I meant the other way around: if you don't "need" to wait for the new task to complete, then maybe you shouldn't.
Assume one implementing fork/join algorithm, but with empty join part, i.e. there are only forks. Your algorithm will decide that all tasks are independent. This will destroy performance and space guarantees (i.e. user will catch out-of-memory exception).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"What I am missing here?" Perhaps that tasks can be the common currency for things that need locality for performance and things that need forward progress?
"Assume one implementing fork/join algorithm, but with empty join part, i.e. there are only forks. Your algorithm will decide that all tasks are independent. This will destroy performance and space guarantees (i.e. user will catch out-of-memory exception)." It's not an algorithm, it's a guideline for a human programmer, who should recognise this situation as a need to wait.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"What I am missing here?" Perhaps that tasks can be the common currency for things that need locality for performance and things that need forward progress?
Please, rephrase above somehow. I am unable to catch the meaning :(
"Assume one implementing fork/join algorithm, but with empty join part, i.e. there are only forks. Your algorithm will decide that all tasks are independent. This will destroy performance and space guarantees (i.e. user will catch out-of-memory exception)." It's not an algorithm, it's a guideline for a human programmer, who should recognise this situation as a need to wait.
Waiting do introduces overheads too (reference counting for 'parent' task, second execution for parent etc). So it looks like trading bad for worse...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Please, rephrase above somehow. I am unable to catch the meaning :(" Whether you need to do something that needs locality for performance, or something that needs forward progress, they're all tasks. The difference is in how you allocate/spawn them so that the scheduler knows how to handle them.
"Waiting do introduces overheads too (reference counting for 'parent' task, second execution for parent etc). So it looks like trading bad for worse..." Hmm, the opportunity for a simple optimisation hardly seems worse than a memory explosion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Please, rephrase above somehow. I am unable to catch the meaning :(" Whether you need to do something that needs locality for performance, or something that needs forward progress, they're all tasks. The difference is in how you allocate/spawn them so that the scheduler knows how to handle them.
Are you talking about something similar to what I've described in #12:
http://software.intel.com/en-us/forums/showpost.php?p=66177
(item 2)
?
"Waiting do introduces overheads too (reference counting for 'parent' task, second execution for parent etc). So it looks like trading bad for worse..." Hmm, the opportunity for a simple optimisation hardly seems worse than a memory explosion.
Currently with LIFO scheduler there is no memory explosion. And no need to obligatory wait for task completions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Are you talking about something similar to what I've described in #12" No round robin.
"Currently with LIFO scheduler there is no memory explosion. And no need to obligatory wait for task completions." Oh yes... Well, luckily I did say "maybe" before to leave an out for myself. :-) So the guideline should be "whichever style avoids the memory explosion should be chosen", which can then be elaborated further.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Are you talking about something similar to what I've described in #12" No round robin.
Well, if user will be manually explicitly distinguishing what tasks require FIFO and what are Ok with LIFO, then I think it's viable idea.
Currently I am sceptical only about automatic distinguishing based on some heuristics.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"So I don't think tbb needs a FIFO manager, it would be an addition no one probably use." Well, LIFO isn't strictly LIFO either, just a general characterisation, but I would not rule out the occasional use for strict FIFO. (Note that I only mentioned the idea of a global queue to avoid the nitty-gritty of implementation details, not to suggest that it might somehow provide strict FIFO.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"So I don't think tbb needs a FIFO manager, it would be an addition no one probably use." Well, LIFO isn't strictly LIFO either, just a general characterisation, but I would not rule out the occasional use for strict FIFO. (Note that I only mentioned the idea of a global queue to avoid the nitty-gritty of implementation details, not to suggest that it might somehow provide strict FIFO.)
Hmm, I seem to be a bit careless lately: I linked this sentence with the previous paragraph. Now I would have to simply disagree: forward progress is a very simple and general concept to start with.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page