- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ideally I would like to call parallel invoke to start two tasks simultaneously. One of them has a several minute run-time but is single threaded, the other has a several hour run-time and includes a parallel reduce.
Suppose there are 8 hardware threads. If I understand correctly initially the parallel reduce will get 7 threads. After a few minutes the single-threaded branch of the parallel invoke will finish, but, (I think I'm right in saying) the parallel reduce will not be able to take over the now empty 8th thread.
Is there any alternative pattern I should be using to ensure full utilisation of all 8 all the time, or should I just accept that the two tasks should be performed serially? The single-threaded task would be very difficult to parallelize.
Thanks in advance,
Tom
Suppose there are 8 hardware threads. If I understand correctly initially the parallel reduce will get 7 threads. After a few minutes the single-threaded branch of the parallel invoke will finish, but, (I think I'm right in saying) the parallel reduce will not be able to take over the now empty 8th thread.
Is there any alternative pattern I should be using to ensure full utilisation of all 8 all the time, or should I just accept that the two tasks should be performed serially? The single-threaded task would be very difficult to parallelize.
Thanks in advance,
Tom
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Suppose there are 8 hardware threads. If I understand correctly initially the parallel reduce will get 7 threads. After a few minutes the single-threaded branch of the parallel invoke will finish, but, (I think I'm right in saying) the parallel reduce will not be able to take over the now empty 8th thread."
Don't worry, that thread should start to steal work from the parallel_reduce if any is available for the taking.
Don't worry, that thread should start to steal work from the parallel_reduce if any is available for the taking.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Raf Schietekat
"Suppose there are 8 hardware threads. If I understand correctly initially the parallel reduce will get 7 threads. After a few minutes the single-threaded branch of the parallel invoke will finish, but, (I think I'm right in saying) the parallel reduce will not be able to take over the now empty 8th thread."
Don't worry, that thread should start to steal work from the parallel_reduce if any is available for the taking.
Don't worry, that thread should start to steal work from the parallel_reduce if any is available for the taking.
Thanks. I think I'd assumed that the auto_partitioner was only splitting the reduction into about as many big chunks as there were threads, rather than into lots of smaller chunks that could be given to newly available threads.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Thanks. I think I'd assumed that the auto_partitioner was only splitting the reduction into about as many big chunks as there were threads, rather than into lots of smaller chunks that could be given to newly available threads."
I did not think of auto_partitioner, but surely it would not sabotage the distribution of work by not letting newly idle threads pick up some of the slack (hmm, does that expression apply here?), especially in the ratio that we're talking about, and sure enough "Reference (Open Source).pdf" Revision 1.13 (may not be the latest) indicates that work will be divided in a number of ranges "proportional to the number of threads specified by the task_scheduler_init". Also, compiling information about which threads are currently idle would not be scalable, and therefore should not even be attempted for that reason alone.
I did not think of auto_partitioner, but surely it would not sabotage the distribution of work by not letting newly idle threads pick up some of the slack (hmm, does that expression apply here?), especially in the ratio that we're talking about, and sure enough "Reference (Open Source).pdf" Revision 1.13 (may not be the latest) indicates that work will be divided in a number of ranges "proportional to the number of threads specified by the task_scheduler_init". Also, compiling information about which threads are currently idle would not be scalable, and therefore should not even be attempted for that reason alone.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page