Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

LIFO vs FIFO

Dmitry_Vyukov
Valued Contributor I
1,898 Views
MADadrobiso:

TBB uses an approximation of Cilk-style scheduling (work LIFO; steal FIFO).It has good locality and load balancing properties while making certain space guarantees. It enables clean nesting of software components, analogous to the way serial code composes by nested subroutine calls.

FIFO has terrible locality and often has no space guarantees, which often leads to complex throttling mechanisms. Nonetheless FIFO scheduling definitely has its uses, notably for soft real-time. Finding a rationale marriage ofCilk-style and FIFO scheduling remains an open problem. If done rigorously it's a worthyfor aPh.D. thesis.

The "two roots" idea suggests that perhaps said marriage could be accomplished by some kind of epoch mechanism, where each task belongs to an epoch and older epochs are given precedence. There's a long way between a notion and a practical system.



There are two kinds of user algorithms.
First is a 'Cilk-style' computational algorithms ('HPC'). They doesn't require nor won't benefit from 'fair' processing. Even worser, they will degrade and will require more memory.
Second is a kind of general-purpose algorithms (games, client/server software). Sometimes they just require fair (at least to some degree) processing. Otherwise they can basically deadlock.
So, I think, the first decision one have to made is how to cope with those kinds of user algorithms. For example TBB can postulate that it doesn't support second kind at all. Or TBB can have 2 modes of operation, i.e. user will have to setup proper type of scheduler at the beginning of program. Or TBB can support 2 kind simultaneously, for example on per task basis, i.e. those tasks are HPC, and those are general-purpose. Or just provide some single compromise scheduler.

I'm curious what .NET TPL team will decide on this problem:
http://blogs.msdn.com/pfxteam/archive/2008/08/01/8800195.aspx
I hope they will publish some results. Btw, I see some interesting words in comments, like PRIO, SIZE and AGE.

0 Kudos
52 Replies
Dmitry_Vyukov
Valued Contributor I
1,176 Views
MADadrobiso:

The "two roots" idea suggests that perhaps said marriage could be accomplished by some kind of epoch mechanism, where each task belongs to an epoch and older epochs are given precedence. There's a long way between a notion and a practical system.




Hmm.... Scheduling based on epochs looks interesting...
When I was designing task-processing framework, I was thinking about something like: N tasks precessed in LIFO order, then 1 task is dequeued in FIFO order, then again N tasks precessed in LIFO order.
But anyway this can blow up space requirements...

The interesting thing is try to determine whether user algorithm makes forward progress or not. False negatives are Ok, false positives are unacceptable. If user algorithm makes forward progress then no need to change epoch, or use FIFO order. If no forward progress, then epoch must be changes, or some tasks must be dequeued in FIFO order. This can ensure maximum performance and space guarantees for HPC tasks, and at the same time guarantees of forward-progress for general-purpose tasks.

0 Kudos
ARCH_R_Intel
Employee
1,176 Views
I wonder how much different an epoch-based scheme is from simply having both a LIFO and a FIFO, and giving the LIFO preference. I.e., the outer-most task graph would run FIFO. The LIFO would handle all the inner levels. That yields a system that would sort of be like a bunch of sequential communicating processes, where each process is running a traditional call stack.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,176 Views
Quoting - raf_schietekat
That would be the general idea, but there are obviously ways to trade absolute fairness for better performance. Using a distributed queue could be one. The merge could be done with a coarse timestamp granularity (did somebody say "epochs"?), and an idle thread could take a bigger bite than just the front task. A thread could keep tasks local until the end of the execute() or beyond (based on a maximum delay)

Hmm... it looks like plain FIFO. I.e. no locality, no space-guarantees...

I think it's unacceptable trade-off for computational tasks which don't require any fairness at all.

0 Kudos
RafSchietekat
Valued Contributor III
1,176 Views

"Hmm... it looks like plain FIFO. I.e. no locality, no space guarantees..." That would be where "keep tasks local until the end of the execute() or beyond (based on a maximum delay)" comes in (see added emphasis), and I guess it is appropriate to first execute locally queued tasks (after a quick check that other threads are not getting too far behind). Perhaps it may still make sense to have affinity in the global queue, but between FIFO and stealing the locality would have already decreased.

I wonder how radical a revision of the scheduler should be: somewhere between absolute fairness (everything in a global queue, which would be inefficient) and mere guaranteed progress (no global queue, where one thread can get behind while other threads are having a good time doing nothing in particular, all with maximum efficiency). But what elements should be taken into consideration? Maybe tasks can have different priorities (GUI focus, GUI non-focus, background computation), maybe they have soft real-time constraints, maybe they have ordering constraints, ...

"I wonder how much different an epoch-based scheme is from simply having both a LIFO and a FIFO": tasks and LIFO work would be unbounded, so what would be the details of an epoch here vs. with a pre-emptive scheduler?

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,176 Views
I wonder how much different an epoch-based scheme is from simply having both a LIFO and a FIFO, and giving the LIFO preference. I.e., the outer-most task graph would run FIFO. The LIFO would handle all the inner levels. That yields a system that would sort of be like a bunch of sequential communicating processes, where each process is running a traditional call stack.

I think that this scheme still can deadlock... if I am not missing something.

Assume that 'critical task' (which will resolve deadlock) is spawned at "inner levels". If inner levels use LIFO than this task can never be executed. So to be fair scheduler must guarantee eventual execution of ALL tasks, spawned on any level, at any time.

Scheduler based on epochs provides such guarantee.

0 Kudos
RafSchietekat
Valued Contributor III
1,176 Views

"If inner levels use LIFO than this task can never be executed." What, no stealing anymore? Why not? Or is this a scenario where TBB would deadlock too?

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,176 Views
Quoting - raf_schietekat

"If inner levels use LIFO than this task can never be executed." What, no stealing anymore? Why not? Or is this a scenario where TBB would deadlock too?

It can deadlock even with work stealing.

The most trivial situation is when we have only 1 thread (processor).

If we have more than 1 thread (processor), then assume all threads are livelocked by infinitely resurrecting tasks.

TBB will deadlock here.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,176 Views
Quoting - raf_schietekat

I wonder how radical a revision of the scheduler should be: somewhere between absolute fairness (everything in a global queue, which would be inefficient) and mere guaranteed progress (no global queue, where one thread can get behind while other threads are having a good time doing nothing in particular, all with maximum efficiency). But what elements should be taken into consideration? Maybe tasks can have different priorities (GUI focus, GUI non-focus, background computation), maybe they have soft real-time constraints, maybe they have ordering constraints, ...

I think the main requirement for "non-computational" tasks is that every task must be eventually executed.

0 Kudos
RafSchietekat
Valued Contributor III
1,176 Views

First of all sorry for the weird stuff in some of my writings since the forum was changed (which also left me unable to go in and correct anything). With Firefox (1.5 and 2.0 alike, between Linux, OpenSolaris and Windows) it's a war zone you would not believe, yet the weird stuff that got inserted through copy&paste was invisible on it except for its side effect of causing wrapping at character boundaries. Apparently Intel has never heard of this strange little browser that nobody uses (imagine smiley for irony here), but it also has to be said that Firefox gets in a strange trance when visiting the forum, forcing me to kill it (no site should be able to do that; in its defence, I cannot remember having experienced this with Firefox before).

Also, coming back to: "If inner levels use LIFO than this task can never be executed.", my reply "What, no stealing anymore?" was rather thoughtless. Here's a new attempt:

Considering that TBB is for optional parallelism, where programs must make no assumptions about the number of available threads, would any existing program written for it deadlock with the new scheme, and why? Would any programs written for the new scheme deadlock, and why? I guess I don't understand exactly what you mean with "critical task (which will resolve deadlock)" etc. Or is optional parallelism itself a problem?

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,176 Views
Quoting - raf_schietekat

Considering that TBB is for optional parallelism, where programs must make no assumptions about the number of available threads

While parallelism is optional, order of execution of tasks still has significance.

Program can work without parallelism with one order of execution, and the same program can deadlock without parallelism with another order of execution.

Quoting - raf_schietekat

would any existing program written for it deadlock with the new scheme, and why?

NO!

Some programs will deadlock with current TBB scheduler. But it's NOT the bug in TBB. Such programs are just not supported (supporting such programs DO will sacrifice performance for other programs).

So the main question is whether it's possible to support and those programs too with strictly insignificant performance degradation for other programs?

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,176 Views
Quoting - raf_schietekat

Would any programs written for the new scheme deadlock?


YES.

Quoting - raf_schietekat

and why? I guess I don't understand exactly what you mean with "critical task (which will resolve deadlock)" etc. Or is optional parallelism itself a problem?

"critical task" is one which, for example, executes task_group_context::cancel_group_execution() (or manually interrupts execution of group of tasks) when it is executed. If it is NOT executed then other tasks will be executing infinitely (assume they recursively respawn themselves).

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,176 Views
Quoting - Dmitriy V'jukov

Some programs will deadlock with current TBB scheduler. But it's NOT the bug in TBB. Such programs are just not supported (supporting such programs DO will sacrifice performance for other programs).

So the main question is whether it's possible to support and those programs too with strictly insignificant performance degradation for other programs?

The more I think about the problem the more I trend toward that it's impossible to solve (at least in single scheduler).

For every scheme I can come up with, I can come up also with counter example on which scheme has unbounded space requirements and/or worse locality.

So solutions I see:

1. Don't support fairness at all (good variant - no need to change anything :) )

2. Just incorporate 2 schedulers: 1 LIFO + 1 FIFO. Scheduler determined in task description, or specified as parameter to spawn(). Worker thread can just process N tasks from LIFO scheduler, and then N tasks from FIFO scheduler.

0 Kudos
RafSchietekat
Valued Contributor III
1,176 Views

Is it too restricting to eliminate programs that make assumptions about order of execution beyond the dependencies expressed in the task tree? cancel_group_execution() as a means to schedule an end to the execution of a program should only be "spawned" in a tbb_thread.

A round-robin between LIFO and FIFO seems like a recipe for space requirement explosions: a worker thread should always first exhaust its own LIFO task tree before doing anything else (like stealing or executing FIFO tasks).

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,176 Views
Quoting - raf_schietekat

Is it too restricting to eliminate programs that make assumptions about order of execution beyond the dependencies expressed in the task tree?

I can't answer this question. This depends on TBB's aims. But .NET TPL team take it very seriously:

http://blogs.msdn.com/pfxteam/archive/2008/08/01/8800195.aspx

(also read user comments)

Quoting - raf_schietekat

A round-robin between LIFO and FIFO seems like a recipe for space requirement explosions: a worker thread should always first exhaust its own LIFO task tree before doing anything else (like stealing or executing FIFO tasks).

I meant that tasks in FIFO scheduler can't spawn tasks in LIFO scheduler, i.e. it's completely separate types and graphs of tasks. So space requirements are 2*N, so to say. I.e. N tasks in LIFO scheduler and N tasks in FIFO scheduler.

0 Kudos
RafSchietekat
Valued Contributor III
1,176 Views

"I can't answer this question." Well, it was mostly a rhetorical question related to the sentence that follows it. I would add more explicitly that no program should assume exact LIFO for correctness (only for performance and cache locality). Likewise, when I say that the LIFO tree should be exhausted before anything else is done, that would only be an advice to the scheduler, but again a program should not rely on it: the fewer assumptions, the better. But other than that, the TBB project should go to work on adding forward progress, priorities, time limits (best effort), etc.

Separating LIFO and FIFO seems rather limiting, so I would allow them to freely interoperate. The only thing that needs to be prevented is a scheduler-induced space requirement explosion (the user does not need to be protected from himself any more than for, e.g.,normal recursive calls), and that seems to be assured by exhausting the local "LIFO" tree before executing any "FIFO" task.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,176 Views
Quoting - raf_schietekat

Separating LIFO and FIFO seems rather limiting, so I would allow them to freely interoperate. The only thing that needs to be prevented is a scheduler-induced space requirement explosion (the user does not need to be protected from himself any more than for, e.g.,normal recursive calls), and that seems to be assured by exhausting the local "LIFO" tree before executing any "FIFO" task.

The problem is exactly when local LIFO tree is infinite... so it will be quite difficult to exhaust it :)

0 Kudos
RafSchietekat
Valued Contributor III
1,176 Views

"The problem is exactly when local LIFO tree is infinite..." I would consider that a bug like any old infinite loop, and not TBB's rightful concern. Any programs that are, e.g., running a dispatch loop or somesuch from inside a task for lack of specific support from TBB would continue to work; any adaptation of such programs needs specific attention to its pseudo scheduler anyway to either make any new parts tap into it or overhaul the whole thing. Did I miss anything?

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,176 Views
Quoting - raf_schietekat

"The problem is exactly when local LIFO tree is infinite..." I would consider that a bug like any old infinite loop, and not TBB's rightful concern. Any programs that are, e.g., running a dispatch loop or somesuch from inside a task for lack of specific support from TBB would continue to work; any adaptation of such programs needs specific attention to its pseudo scheduler anyway to either make any new parts tap into it or overhaul the whole thing. Did I miss anything?

Then I don't understand what problem you are trying to solve. Can you clarify and maybe describe some examples?

Why you want to reject LIFO in favor of FIFO?

0 Kudos
RafSchietekat
Valued Contributor III
1,176 Views

Hey, the forum has eaten my reply (I think)! An attempt at reconstructing its main elements:

I am merely proposing a combination of LIFO and FIFO (both are needed, I'm rejecting neither), like Arch's post #2 above, but without any guarantees other than respecting user-specified task dependencies (support for diamonds would be nice), i.e., a program must not fail if the task graph is randomly rearranged (although it can be assumed for concerns of locality that this happens rarely if ever). Likewise, only forward progress would be guaranteed for queued tasks, not strict FIFO, which would be too expensive (an implementation could construct a queue of task buckets, which may be stolen by other worker threads). Queued tasks may already have LIFO dependencies before they are transferred to the current LIFO tree/dag, executing tasks can queue tasks as well as spawn dependencies, worker threads repeatedly grab a batch of queued tasks that are then executed to completion in LIFO order.

For this to work, tasks must execute to completion (including closure of dependencies) in a finite amount of time, just like serial programs, but any existing programs that don't know about the new TBB-provided FIFO functionality and cheat out of necessity by, e.g., serving a concurrent_queue from inside a long-running execute(), will continue to work, so the change would not be disruptive. Stopping a runaway execution can only be done synchronously (perhaps by polling), not by a scheduled task (neither spawned nor queued), or else by an external intervention, e.g., a tbb_thread, although this should probably be frowned upon, like using exceptions as part of the expected flow of control.

Is this clearer, even without examples?

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,046 Views
Quoting - raf_schietekat

Hey, the forum has eaten my reply (I think)! An attempt at reconstructing its main elements:

I am merely proposing a combination of LIFO and FIFO (both are needed, I'm rejecting neither), like Arch's post #2 above, but without any guarantees other than respecting user-specified task dependencies (support for diamonds would be nice), i.e., a program must not fail if the task graph is randomly rearranged (although it can be assumed for concerns of locality that this happens rarely if ever). Likewise, only forward progress would be guaranteed for queued tasks, not strict FIFO, which would be too expensive (an implementation could construct a queue of task buckets, which may be stolen by other worker threads). Queued tasks may already have LIFO dependencies before they are transferred to the current LIFO tree/dag, executing tasks can queue tasks as well as spawn dependencies, worker threads repeatedly grab a batch of queued tasks that are then executed to completion in LIFO order.

For this to work, tasks must execute to completion (including closure of dependencies) in a finite amount of time, just like serial programs, but any existing programs that don't know about the new TBB-provided FIFO functionality and cheat out of necessity by, e.g., serving a concurrent_queue from inside a long-running execute(), will continue to work, so the change would not be disruptive. Stopping a runaway execution can only be done synchronously (perhaps by polling), not by a scheduled task (neither spawned nor queued), or else by an external intervention, e.g., a tbb_thread, although this should probably be frowned upon, like using exceptions as part of the expected flow of control.

Is this clearer, even without examples?

Not entirely...

If we are assuming that there are no 'infinitely resurrecting' tasks (i.e. infinite recursion), then forward progress and guarantees that every task will be eventually processed are ensured by current LIFO scheduler... until I am completely missing something.

So if we will add a portion of FIFO into scheduler, it can only decrease locality of processing...

My initial intention wrt FIFO was to provide guarantees of eventual processing for ALL situations. Although now I see that it's impossible w/o substantial degradation of properties for 'normal well-behaved' tasks...

0 Kudos
Reply