Solved: No barriers in cilkplus?

David_M_17 · ‎06-10-2016

I was nesting parallelism with cilkplus. At the high level I invoked several cilk_spawn, inside of that I then used cilk_for. I created a reducer - there doesn't seem to be any way to reduce across the spawned tasks without going back up to the function invoking the spawns and calling cilk_sync. This means I must invoke cilk_spawn with new entry points to continue.

Something like this:

cilk::reducer< cilk::op_add <int > > mysum (0) ;
main
{
. . . .

cilk_span task1 ;
cilk_span task1 ;
cilk_span task1 ;
cilk_sync ;
sum.get_value() ; // At this point I have the desired global sum
}

task1( . .. )
{
. . . .
   cilk_for( . . . )
   {
      . . .
      *sum += . . . ;
      . . . .
   }
sum.get_value() ; // this gives me a correct value within this scope (the preceding cilk_for); not across all spawned tasks. OK fair enough
// when I complete task1 and go back up to main and complete cilk_sync - sum is correctly reduced across all spawned tasks.
// I would like a barrier to synchronize sum across all spawned tasks without going back up to main (or preceding parents) to both
// synchronize AND complete all the tasks. I would like a simple barrier and resume.

cilk_sync does synchronize - but it also completes or ends the thread of execution. I would like to sync and continue. I guess this is a case where TBB offers more flexibility than cilkplus?

Pablo_H_Intel · ‎06-10-2016

Cilk and TBB both use work stealing, which gives no guarantee of independent progress on separate tasks. Tasks are not threads and synchronization across tasks is generally discouraged; many kinds of synchronization (such as barriers) don't work at all. If the program cannot be correctly executed on a single thread, then it is not a valid Cilk Plus program.

If there were a barrier construct, what would it do? In your example, you are asking to reduce across tasks that may or may not have already been spawned. What does that mean? What would you expect to be visible in the reducer after the barrier? If the program is running on a single thread, then the barrier would just cause deadlock as it waits for tasks that have not yet run and which cannot ever run because you've tied up the only worker. Or maybe it would synchronize with those tasks that have already completed? But that would be non-deterministic. Or maybe all tasks that, in the serial program, would already have run? OK, but that would partially serialize the program and kill parallel speedup in the general case. I have honestly never seen a need for this in a correctly-constructed fork-join program.

cilk_sync does synchronize - but it also completes or ends the thread of execution. I would like to sync and continue. I guess this is a case where TBB offers more flexibility than cilkplus?

Could you explain how you would do this better in TBB? TBB doesn't have reducers, and has mostly the same work-stealing semantics as Cilk Plus, so I don't understand what "sync and continue" means for TBB in this context.

View solution in original post

Pablo_H_Intel · ‎06-10-2016

Cilk and TBB both use work stealing, which gives no guarantee of independent progress on separate tasks. Tasks are not threads and synchronization across tasks is generally discouraged; many kinds of synchronization (such as barriers) don't work at all. If the program cannot be correctly executed on a single thread, then it is not a valid Cilk Plus program.

If there were a barrier construct, what would it do? In your example, you are asking to reduce across tasks that may or may not have already been spawned. What does that mean? What would you expect to be visible in the reducer after the barrier? If the program is running on a single thread, then the barrier would just cause deadlock as it waits for tasks that have not yet run and which cannot ever run because you've tied up the only worker. Or maybe it would synchronize with those tasks that have already completed? But that would be non-deterministic. Or maybe all tasks that, in the serial program, would already have run? OK, but that would partially serialize the program and kill parallel speedup in the general case. I have honestly never seen a need for this in a correctly-constructed fork-join program.

cilk_sync does synchronize - but it also completes or ends the thread of execution. I would like to sync and continue. I guess this is a case where TBB offers more flexibility than cilkplus?

Could you explain how you would do this better in TBB? TBB doesn't have reducers, and has mostly the same work-stealing semantics as Cilk Plus, so I don't understand what "sync and continue" means for TBB in this context.

jimdempseyatthecove · ‎06-17-2016

Try this sketch:

 atomic <int> mysum ;
 main
 {
 . . . .
 mysum = 0;
 cilk_span task1 ;
 cilk_span task1 ;
 cilk_span task1 ;
 cilk_sync ;
 // At this point I have the desired global mysum
 }

task1( . .. )
 {
 . . . .
    cilk::reducer< cilk::op_add <int > > tasksum (0) ;
    cilk_for( . . . )
    {
       . . .
       *tasksum += . . . ;
       . . . .
    }
    // atomic operation
    mysum += tasksum.get_value();
 }

Jim Dempsey