Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

task scheduler issues with parallel_reduce

irisshinra308
Beginner
481 Views
Hello~

I have modified the tbb task scheduling policy which enable stealing tasks from others' mailbox and executing them in any time.
However, I saw there are some brief description about the task executing order should be carefully maintained in dealing with parallel_reduce tasks.
Could anybody give me more details about this issue?

Thanks for replying

Dennis
0 Kudos
6 Replies
Anton_Pegushin
New Contributor II
481 Views
Quoting - irisshinra308
Hello~

I have modified the tbb task scheduling policy which enable stealing tasks from others' mailbox and executing them in any time.
However, I saw there are some brief description about the task executing order should be carefully maintained in dealing with parallel_reduce tasks.
Could anybody give me more details about this issue?

Thanks for replying

Dennis
Hello, could you please share a bit more details. What do you mean by "changing the task stealing scheduling policy"? And where did you see the note on task execution order being maintained for parallel_reduced? I assumed this was in the code (as a comment somewhere), but did not find anything resembling "order preservation".
0 Kudos
RafSchietekat
Valued Contributor III
481 Views
"And where did you see the note on task execution order being maintained for parallel_reduced?"
I guess tbb21_20090313oss/src/tbb/task.cpp:2906-2911?
0 Kudos
irisshinra308
Beginner
481 Views
Hello, could you please share a bit more details. What do you mean by "changing the task stealing scheduling policy"? And where did you see the note on task execution order being maintained for parallel_reduced? I assumed this was in the code (as a comment somewhere), but did not find anything resembling "order preservation".

Sorry for confusing you with imprecise description.

I saw this issue from the comment of "inline task* GenericScheduler::get_task(depth d)" in task.cpp.
I have modified this function to let it steal tasks from any mailbox under some circumstances.
That is, one scheduler could execute tasks from other's mailbox when there are still available tasks in its own deque.

It seems that the task derived from parallel_reduce should be executed in order to maintain the correctness.
However, I do not understand the dependency of these tasks and what the comment in task.cpp means clearly.

Could you please give me some details about this issue?

Thank you very much

Dennis
0 Kudos
RafSchietekat
Valued Contributor III
481 Views
"I have modified this function to let it steal tasks from any mailbox under some circumstances."
Secret circumstances?
0 Kudos
irisshinra308
Beginner
481 Views
Quoting - Raf Schietekat
"I have modified this function to let it steal tasks from any mailbox under some circumstances."
Secret circumstances?

In fact, I would monitor the hardware information to help the scheduler to make the scheduling decision.
Stealing tasks from other's mailbox is just a thought which flashed through my mind, and I am also still trying to figure out "what circumstances" this would improve the performance.
However, this is still a naive idea and I am still working on some more detail right now.

Dennis
0 Kudos
Anton_Pegushin
New Contributor II
481 Views
Quoting - irisshinra308
It seems that the task derived from parallel_reduce should be executed in order to maintain the correctness.
However, I do not understand the dependency of these tasks and what the comment in task.cpp means clearly.

Hi, yes, you're right, preserving the order of task execution and stealing is a matter of correctness. The reduction operation is defined by the user, and the only restriction from the library is that this operation needs to be associative. But it does not have to be commutative. Let's say it's not. If parallel_reduce divided the iteration space into 4 parts: a, b, c and d, it can run reduction as: (a * b) * (c * d) or a * (b * c) * d (where "*" denotes the reduction operation), but it can not execute it in the order: (a * c) * (b * d). In the latter case the result will be incorrect, or in other words it might be different from the result received from single-thread algorithm execution. This is why the order in which tasks are executed is important.

Another issue is with parallel_reduce internal optimization. Which is called a lazy splitting. It means that if the right child was not stolen, it's sub-iteration space will be processed by the Body, which was processing the sub-iteration space for the left child => saving on allocation of right child task and on not calling the join() between left and right child. And this is correct from the "maintaning order" point of view, because we're moving left to right collecting partial reduction results.

So what the comment in "get_task" refers to is (bellow is an oversimplification, but makes it easy to understand):
if thread1 is working on "a", but "b" gets stolen by thread2 and thread1 needs to start executing "c", it (thread1) needs to realize that this (new task "c")is not an immediate neighbour of "a", so it needs to call a splitting constructor, leave "a"'s partical results behind (so that(a * c) does not happen) and create a separate "c", so that join()'s could be called later in the correct order.
Makes sense?
0 Kudos
Reply