Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

The difference in code flow

Nav
New Contributor I
507 Views
This question is a bit naive, but here it is anyway:

Everywhere you look, you'll see the OpenMP representation of code flow as

/"""""""""""""\ <- slave threads
--------------------------------- <-master thread
\_________/

Is the TBB code flow any different?
With the use of tasks, I agree that it would go into a tree structure (tasks are now there in OpenMP 3 too). But otherwise, is there a difference?

Secondly, from the TBB FAQ:
"...going beyond data parallelism to be suitable for programs with nested parallelism, irregular parallelism and task parallelism"

I'd really appreciate it if you could elaborate on what this means. Aren't the exact same functionalities available in OpenMP too?
0 Kudos
1 Solution
ARCH_R_Intel
Employee
507 Views

An example of irregular parallelism would be the parallel preorder traveral example in examples/parallel_do/parallel_preorder/. I think it is essentially impossible to express cleanly in OpenMP until OpenMP 3.0 came along. Likewise for the pipelining example in examples/pipeline/square/.

These exampls are doable in OpenMP 3.0, but OpenMP 3.0 carries some unfortunate historical baggage. In particular, its parallelism requires "thread teams" that have to be allocated before the amount of work is known. TBB follows a more modern work-stealing approach pioneered by Cilk.

View solution in original post

0 Kudos
6 Replies
Alexey-Kukanov
Employee
507 Views
Conceptually in this simple case the TBB code flow is the same - when you start a parallel algorithm, the work is redistributed between available worker threads, and when the algorithm finishes, the master thread proceeds and the worker threads sleep.

Innot so simple cases, there is difference. In OpenMP, nesting is either disabled (so the nested parallel region is executed on its master thread only) or creates a new thread team. In TBB, the work of nested parallel algorithms will be distributed across the worker threadsof the main (and single) team. Similarly, if there are two independent master threads each starting a parallel region, in OpenMP they will be served by different teams, in TBB - by the same one.
The TBB FAQ was written somewhat before OpenMP 3.0 added tasking, so the message about better suitability for task parallelism can be considered outdated. Personally, I like the TBB tasking model more than the OpenMP one, but it is probably subjective :)
As for irregular parallelism, I am not sure what it means in this context (being considered different than task parallelism). But definitely there are TBB algorithms such as parallel_do and pipeline that make certain patterns more convenient to program than with OpenMP.
0 Kudos
robert-reed
Valued Contributor II
507 Views
Quoting - Nav
This question is a bit naive, but here it is anyway:

Everywhere you look, you'll see the OpenMP representation of code flow as

/""""""""""""" <- slave threads
--------------------------------- <-master thread
_________/

Is the TBB code flow any different?
With the use of tasks, I agree that it would go into a tree structure (tasks are now there in OpenMP 3 too). But otherwise, is there a difference?

Secondly, from the TBB FAQ:
"...going beyond data parallelism to be suitable for programs with nested parallelism, irregular parallelism and task parallelism"

I'd really appreciate it if you could elaborate on what this means. Aren't the exact same functionalities available in OpenMP too?

That's a pretty generic diagram, which would fit models from OpenMP to TBB to "fork-join" or might also be described as opportunistic parallelism: the "slave" threads reside in a pool until they're called upon to do some work, which they perform until it's done and they retire to the pool to await the next bit of work. Where the differences between OpenMP and TBB arise are in how the threads are scheduled to share the work in the midst of the fork-join model. The diagram represented above refers only to the activity of the slave threads, not the communications between them.

OpenMP 3.0 provides a task-oriented feature that breaks it out from the mostly loop-carried parallelism that has been its major workhorse in previous versions (and there are new features for merging nested loops together, very cool). OpenMP does support nesting (selectable) but does not have the same set of scheduling options available with TBB. They are different options and depending on the actual conditions in the code, may prove to be more efficient than the TBB equivalents (I know of one case where that is true) but generally the TBB scheduling options, with range splitting, task stealingand more of an emphasis on recursive parallelism to add work to the queues as work is needed, certainly offers more flexibility. The whole task-cancellation mechanism that you're struggling to make use of is something you cannot do in OpenMP.
0 Kudos
ARCH_R_Intel
Employee
508 Views

An example of irregular parallelism would be the parallel preorder traveral example in examples/parallel_do/parallel_preorder/. I think it is essentially impossible to express cleanly in OpenMP until OpenMP 3.0 came along. Likewise for the pipelining example in examples/pipeline/square/.

These exampls are doable in OpenMP 3.0, but OpenMP 3.0 carries some unfortunate historical baggage. In particular, its parallelism requires "thread teams" that have to be allocated before the amount of work is known. TBB follows a more modern work-stealing approach pioneered by Cilk.

0 Kudos
Nav
New Contributor I
507 Views

Quoting: Robert Reed
They are different options and depending on the actual conditions in the code, may prove to be more efficient than the TBB equivalents (I know of one case where that is true)

Would it be possible to post that one case where OpenMP is faster than TBB? I've tried, but even for serial code, TBB was faster than OpenMP.

0 Kudos
robert-reed
Valued Contributor II
507 Views
Quoting Nav

Quoting: Robert Reed
They are different options and depending on the actual conditions in the code, may prove to be more efficient than the TBB equivalents (I know of one case where that is true)

Would it be possible to post that one case where OpenMP is faster than TBB? I've tried, but even for serial code, TBB was faster than OpenMP.

Sorry, it is a fairly large piece of code that belongs to one of our customers. I can't give you any more details than that.

0 Kudos
Nav
New Contributor I
507 Views
I understand...thanks.

If there is any knowledge that you could share (that is not customer related), it would be appreciated, since my question is more on the side of knowing how and where OpenMP can be faster than TBB.

If not, then let it be.

0 Kudos
Reply