Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

GPU + TBB: Scheduler Patch

AJ13
New Contributor I
263 Views
Hi all,

The editor ate my last message. Lesson: don't try to attach files.

I'm working on combining CPU and GPU parallelism. I have a global data structure which stores special tasks which can run on either the GPU or CPU. I have a function called pbb::execute_data_task_cpu() that 1) returns true if if a task was found, and it was executed 2) returns false if the global GPU/CPU task data structure was empty. Notice that regular tbb::tasks still are there as always, the tasks I'm talking about are special and different.

The special GPU/CPU tasks are spawned by a regular tbb::task. In order to get the tbb::task to block, which doesn't know about the separate "special" task pool, I set it's ref count to be 10... just some high number. This blocks the task. As CPU/GPU tasks complete, the parent tbb::task is informed. Once they have all completed, the very last GPU/CPU task to finish sets the ref_count of the parent tbb::task to be 1. This unblocks it, and the scheduler resumes its magic.

Here is a patch to the task.cpp file, where I have modified wait_for_all(). Ideally the CPU should only try to obtain tasks that the GPU normally would take if it has absolutely no other choice. Benchmarks show the approach works well so far.

And yah, the goto code is messy... but this was the easy way to alter it for now :-)

I'd appreciate your feedback.

Thanks,

AJ

--- tbb21_20081109oss/src/tbb/task.cpp 2008-11-14 08:11:44.000000000 -0500
+++ ../tbb21_20081109oss/src/tbb/task.cpp 2009-02-27 02:54:15.000000000 -0500
@@ -31,6 +31,9 @@
world, and by putting them in a single translation unit, the
compiler's optimizer might be able to do a better job. */

+#include
+
+
#if USE_PTHREAD

// Some pthreads documentation says that must be first header.
@@ -2335,10 +2338,11 @@
} while( t ); // end of local task array processing loop

inbox.set_is_idle( true );
- __TBB_ASSERT( arena->prefix().number_of_workers>0||parent.prefix().ref_count==1, "deadlock detected" );
+// __TBB_ASSERT( arena->prefix().number_of_workers>0||parent.prefix().ref_count==1, "deadlock detected" );
// The state "failure_count==-1" is used only when itt_possible is true,
// and denotes that a sync_prepare has not yet been issued.
for( int failure_count = -static_cast(SchedulerTraits::itt_possible);; ++failure_count) {
+try_stealing_again:
if( parent.prefix().ref_count==1 ) {
if( SchedulerTraits::itt_possible ) {
if( failure_count!=-1 ) {
@@ -2360,10 +2364,35 @@
if( victim>=arena_slot )
++victim; // Adjusts random distribution to exclude self
t = steal_task( *victim, d );
- if( !t ) goto fail;
+
+ if( !t )
+ {
+ // Try to run a data task on the CPU
+ if(pbb::execute_data_task_cpu())
+ {
+ goto try_stealing_again;
+ }
+ else
+ {
+ goto fail;
+ }
+ }
+
+
if( is_proxy(*t) ) {
t = strip_proxy((task_proxy*)t);
- if( !t ) goto fail;
+ if( !t )
+ {
+ // Try to run a data task on the CPU
+ if(pbb::execute_data_task_cpu())
+ {
+ goto try_stealing_again;
+ }
+ else
+ {
+ goto fail;
+ }
+ }
GATHER_STATISTIC( ++proxy_steal_count );
}
GATHER_STATISTIC( ++steal_count );
@@ -2396,6 +2425,13 @@
break;
}
}
+ else {
+ // Try to run a data task on the CPU
+ if(pbb::execute_data_task_cpu())
+ {
+ goto try_stealing_again;
+ }
+ }
fail:
if( SchedulerTraits::itt_possible && failure_count==-1 ) {
// The first attempt to steal work failed, so notify Intel Thread Profiler that

0 Kudos
2 Replies
AJ13
New Contributor I
263 Views
Any feedback on this from the scheduler gurus?

Thanks!
0 Kudos
Alexey-Kukanov
Employee
263 Views
Clearly it's a hack to help in your research. If it works for you, then it's just fine :)

I do not understand though why you needed the trick with reference counter. As far as I understand from your writing, all the child "GPU" tasks are counted anyway, so why not use the usual rule of setting ref_count to the number of children plus 1 and etc.
0 Kudos
Reply