Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.

Parallel reduce under parallel invoke

seaephpea
Beginner
153 Views
Ideally I would like to call parallel invoke to start two tasks simultaneously. One of them has a several minute run-time but is single threaded, the other has a several hour run-time and includes a parallel reduce.

Suppose there are 8 hardware threads. If I understand correctly initially the parallel reduce will get 7 threads. After a few minutes the single-threaded branch of the parallel invoke will finish, but, (I think I'm right in saying) the parallel reduce will not be able to take over the now empty 8th thread.

Is there any alternative pattern I should be using to ensure full utilisation of all 8 all the time, or should I just accept that the two tasks should be performed serially? The single-threaded task would be very difficult to parallelize.

Thanks in advance,

Tom
0 Kudos
3 Replies
RafSchietekat
Black Belt
153 Views
"Suppose there are 8 hardware threads. If I understand correctly initially the parallel reduce will get 7 threads. After a few minutes the single-threaded branch of the parallel invoke will finish, but, (I think I'm right in saying) the parallel reduce will not be able to take over the now empty 8th thread."
Don't worry, that thread should start to steal work from the parallel_reduce if any is available for the taking.
seaephpea
Beginner
153 Views
Quoting - Raf Schietekat
"Suppose there are 8 hardware threads. If I understand correctly initially the parallel reduce will get 7 threads. After a few minutes the single-threaded branch of the parallel invoke will finish, but, (I think I'm right in saying) the parallel reduce will not be able to take over the now empty 8th thread."
Don't worry, that thread should start to steal work from the parallel_reduce if any is available for the taking.

Thanks. I think I'd assumed that the auto_partitioner was only splitting the reduction into about as many big chunks as there were threads, rather than into lots of smaller chunks that could be given to newly available threads.
RafSchietekat
Black Belt
153 Views
"Thanks. I think I'd assumed that the auto_partitioner was only splitting the reduction into about as many big chunks as there were threads, rather than into lots of smaller chunks that could be given to newly available threads."
I did not think of auto_partitioner, but surely it would not sabotage the distribution of work by not letting newly idle threads pick up some of the slack (hmm, does that expression apply here?), especially in the ratio that we're talking about, and sure enough "Reference (Open Source).pdf" Revision 1.13 (may not be the latest) indicates that work will be divided in a number of ranges "proportional to the number of threads specified by the task_scheduler_init". Also, compiling information about which threads are currently idle would not be scalable, and therefore should not even be attempted for that reason alone.
Reply