I am trying to parallelize a section in my application which comprises of 2 consecutive for-loops. These loops are independent with each other, so, ideally, I would like to follow a 2-level parallelization approach: spawn each loop as a different task at the higher level, and then parallelize each one with tbb::parallel_for.
Would it be possible to compose in such a way different TBB parallel constructs, i.e. use a parallel_for inside a task or task_group? And if yes, would it work well, i.e., as if the 2 loops were handled as a single, twice as large parallel for loop? My intuition says that, once the 2 loops are scheduled simultaneously onto some worker queue and their recursive splitting starts, the execution from that point on would not be much different from the case of a unified, larger loop.
Recursive parallelism is exploited extensively by TBB, and you are encouraged to use it yourself, for example by nesting parallel_for inside parallel_invoke. It doesn't really matter whether the tasks are homogeneous, as under the hood inside a single parallel_for loop, or heterogeneous, as between the different levels of the nesting you propose. Just don't exaggerate: nesting parallel_for inside parallel_for is likely to perform less well than a serial loop inside parallel_for, because of parallel overhead.
Thanks Raf! Nesting parallel_for inside parallel_invoke seems to do the job. And it does not incur additional overhead at all (i.e., performs exactly as a single parallel loop working over the unified range of both loops). I agree that nesting parallel_for's in a more aggressive manner may worsen things.