Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Performance about nested parallel_while

wzpstbb
Beginner
235 Views
Hello,

I have one use case running a pipeline concurrently using parallel_while. The other pipeline can be spawned by this pipeline and is also run concurrently using parallel_while. It looks likethe performance is better if I don't use paralle_while to run the second pipeline. Does that mean nested parallel_while is not recommended?

Thanks,
Wallace
0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
235 Views
Wallace,

In any parallel programming paradigm, if you use nested parallelism, and if all the available threads in your thread pool are running tasks from the outermost layer of nesting, then partitioning of the inner nest layers into sub-tasks are adding overhead.

If your outer while loop is small (compared to number of cores) then parallel-ize the inner loop. If the outer while loop is large (compared to number of cores) then parallel-ize the outer loop.

Also, if you have been reading a different thread here, if what you mean by pipeline is the parallel_pipeline, then make sure you are not nesting task_scheduler_init's (unless each are using a sub-set of available hw threads).

Jim Dempsey

View solution in original post

0 Kudos
3 Replies
jimdempseyatthecove
Honored Contributor III
236 Views
Wallace,

In any parallel programming paradigm, if you use nested parallelism, and if all the available threads in your thread pool are running tasks from the outermost layer of nesting, then partitioning of the inner nest layers into sub-tasks are adding overhead.

If your outer while loop is small (compared to number of cores) then parallel-ize the inner loop. If the outer while loop is large (compared to number of cores) then parallel-ize the outer loop.

Also, if you have been reading a different thread here, if what you mean by pipeline is the parallel_pipeline, then make sure you are not nesting task_scheduler_init's (unless each are using a sub-set of available hw threads).

Jim Dempsey
0 Kudos
wzpstbb
Beginner
235 Views
Thanks Jim. This is very helpful information.

My machine has 8 cores. Sousually the outer pipeline would occupy all the avaiable threads. I will keep the inner pipeline single-threaded.It is also good to knowthe warning aboutnesting task_scheduler_init.

Wallace
0 Kudos
RafSchietekat
Valued Contributor III
235 Views
"It is also good to knowthe warning aboutnesting task_scheduler_init."

Note that task_scheduler_init, which is a reference to shared data, is virtually cost-free when the thread is already engaged in TBB activity, so the warning is largely out of place, except that the use of task_scheduler_init in those locations may be symptomatic of a misconception that may need to be addressed. This has always been so.

Since TBB 2.2, use of task_scheduler_init before a user thread becomes engaged in TBB work is optional.

Since TBB 3.0, when you add a user thread (by any API), you may use task_scheduler_init to independently specify how many TBB worker threads N-1 may be used in addition to that user thread in its "arena", by specifying N as the total number of threads desired, and by doing so before the thread does any TBB-related work. Worker threads are shared between arenas, but participate in only one arena at a time to avoid entanglement that would sabotage required concurrency between arenas. Try to have a clear and valid reason before using this feature (even if it isn't linked with a perceived performance problem in another thread in this forum).
0 Kudos
Reply