Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Parallel reduce under parallel invoke

seaephpea
Beginner
270 Views
Ideally I would like to call parallel invoke to start two tasks simultaneously. One of them has a several minute run-time but is single threaded, the other has a several hour run-time and includes a parallel reduce.

Suppose there are 8 hardware threads. If I understand correctly initially the parallel reduce will get 7 threads. After a few minutes the single-threaded branch of the parallel invoke will finish, but, (I think I'm right in saying) the parallel reduce will not be able to take over the now empty 8th thread.

Is there any alternative pattern I should be using to ensure full utilisation of all 8 all the time, or should I just accept that the two tasks should be performed serially? The single-threaded task would be very difficult to parallelize.

Thanks in advance,

Tom
0 Kudos
3 Replies
RafSchietekat
Valued Contributor III
270 Views
"Suppose there are 8 hardware threads. If I understand correctly initially the parallel reduce will get 7 threads. After a few minutes the single-threaded branch of the parallel invoke will finish, but, (I think I'm right in saying) the parallel reduce will not be able to take over the now empty 8th thread."
Don't worry, that thread should start to steal work from the parallel_reduce if any is available for the taking.
0 Kudos
seaephpea
Beginner
270 Views
Quoting - Raf Schietekat
"Suppose there are 8 hardware threads. If I understand correctly initially the parallel reduce will get 7 threads. After a few minutes the single-threaded branch of the parallel invoke will finish, but, (I think I'm right in saying) the parallel reduce will not be able to take over the now empty 8th thread."
Don't worry, that thread should start to steal work from the parallel_reduce if any is available for the taking.

Thanks. I think I'd assumed that the auto_partitioner was only splitting the reduction into about as many big chunks as there were threads, rather than into lots of smaller chunks that could be given to newly available threads.
0 Kudos
RafSchietekat
Valued Contributor III
270 Views
"Thanks. I think I'd assumed that the auto_partitioner was only splitting the reduction into about as many big chunks as there were threads, rather than into lots of smaller chunks that could be given to newly available threads."
I did not think of auto_partitioner, but surely it would not sabotage the distribution of work by not letting newly idle threads pick up some of the slack (hmm, does that expression apply here?), especially in the ratio that we're talking about, and sure enough "Reference (Open Source).pdf" Revision 1.13 (may not be the latest) indicates that work will be divided in a number of ranges "proportional to the number of threads specified by the task_scheduler_init". Also, compiling information about which threads are currently idle would not be scalable, and therefore should not even be attempted for that reason alone.
0 Kudos
Reply