Community
cancel
Showing results for 
Search instead for 
Did you mean: 
seaephpea
Beginner
55 Views

Parallel reduce under parallel invoke

Ideally I would like to call parallel invoke to start two tasks simultaneously. One of them has a several minute run-time but is single threaded, the other has a several hour run-time and includes a parallel reduce.

Suppose there are 8 hardware threads. If I understand correctly initially the parallel reduce will get 7 threads. After a few minutes the single-threaded branch of the parallel invoke will finish, but, (I think I'm right in saying) the parallel reduce will not be able to take over the now empty 8th thread.

Is there any alternative pattern I should be using to ensure full utilisation of all 8 all the time, or should I just accept that the two tasks should be performed serially? The single-threaded task would be very difficult to parallelize.

Thanks in advance,

Tom
0 Kudos
3 Replies
RafSchietekat
Black Belt
55 Views

"Suppose there are 8 hardware threads. If I understand correctly initially the parallel reduce will get 7 threads. After a few minutes the single-threaded branch of the parallel invoke will finish, but, (I think I'm right in saying) the parallel reduce will not be able to take over the now empty 8th thread."
Don't worry, that thread should start to steal work from the parallel_reduce if any is available for the taking.
seaephpea
Beginner
55 Views

Quoting - Raf Schietekat
"Suppose there are 8 hardware threads. If I understand correctly initially the parallel reduce will get 7 threads. After a few minutes the single-threaded branch of the parallel invoke will finish, but, (I think I'm right in saying) the parallel reduce will not be able to take over the now empty 8th thread."
Don't worry, that thread should start to steal work from the parallel_reduce if any is available for the taking.

Thanks. I think I'd assumed that the auto_partitioner was only splitting the reduction into about as many big chunks as there were threads, rather than into lots of smaller chunks that could be given to newly available threads.
RafSchietekat
Black Belt
55 Views

"Thanks. I think I'd assumed that the auto_partitioner was only splitting the reduction into about as many big chunks as there were threads, rather than into lots of smaller chunks that could be given to newly available threads."
I did not think of auto_partitioner, but surely it would not sabotage the distribution of work by not letting newly idle threads pick up some of the slack (hmm, does that expression apply here?), especially in the ratio that we're talking about, and sure enough "Reference (Open Source).pdf" Revision 1.13 (may not be the latest) indicates that work will be divided in a number of ranges "proportional to the number of threads specified by the task_scheduler_init". Also, compiling information about which threads are currently idle would not be scalable, and therefore should not even be attempted for that reason alone.
Reply