Software Archive
Read-only legacy content
17061 Discussions

Question about __cilkrts_leave_frame()

john_kennedy
Beginner
412 Views

Hello,

I am hacking on Cilk to add IVars to the language and as I was looking through the source code, I was wondering why in cilk-abi.c in the runtime folder there is an atomic op on the clearing of the detached flag of the returning stack frame that is called in __cilkrts_leave_frame(). Is there any contention for this flag? 

Unless I am mistaken, this flag only gets set when the frame is being made runnable/unrunnable or when it is stolen. Since __cilkrts_undo_detach() is only called upon exiting the frame, and it is the current stack frame of the currently running worker, it cannot be stolen because it does not exist as part of the THE dequeue. Is that correct? The code I am referring to is as follows:

<code>

#if defined __i386__ || defined __x86_64__
    __sync_fetch_and_and(&sf->flags, ~CILK_FRAME_DETACHED);
#else
    __cilkrts_fence(); /* membar #StoreLoad */
   sf->flags &= ~CILK_FRAME_DETACHED;
 #endif

</code>

Additionally, I am curious as to what it means for a frame to win the "undo-detach race with flags" in the __cilkrts_bug print message right before __cilkrts_leave_frame() returns. 

<code> 

/* This path is taken when undo-detach wins the race with stealing.
 Otherwise this strand terminates and the caller will be resumed
 via setjmp at sync. */
 if (__builtin_expect(sf->flags & CILK_FRAME_FLAGS_MASK, 0))
 __cilkrts_bug("W%u: frame won undo-detach race with flags %02x\n",
 w->self, sf->flags);

</code>

From that comment, I think that my reasoning above must be wrong. Could someone help me understand how the currently active frame stack frame is/is not involved in a steal?

Thanks,

Chris.

0 Kudos
3 Replies
Barry_T_Intel
Employee
412 Views

You need to understand the THE protocol.  See the comments in scheduler.c with the header "THE protocol".  There's also a pointer to the PLDI paper which explains it in great detail.

The compiler generates a function called a "spawn helper" for every spawned function.  Among other duties, the spawn helper "detaches" the spawned function from the parent.  This means the continuation in the parent is pushed onto the tail of the dequeue and the CILK_FRAME_DETACHED flag is set in the spawn helper's flags.  There are two possible ways for the contination to be executed:

  1. The continuation is stolen by another worker.  This manipulates the worker's exc pointer to signal that it is considering stealing the topmost available frame.  You'll see that the functions that increment and decrement the exc pointer always have a fence to notify other cores that the exc pointer has been modified.
  2. The spawned function returns normally.  __cilkrts_undo_detach() decrements the worker's tail pointer and uses fence or sync to notify other cores that it's been modified.  The flags can also be modified by theives who will set CILK_FRAME_UNSYNCED, which is why we use an atomic operation.  __cilkrts_undo_detach() returns 0 if the parent has been stolen, or if another worker is attempting to steal the parent's continuation right now.  In either case, the call to __cilkrts_c_THE_exception() will take out the worker lock and make a final determination if the parent has been stolen.

So the race is between the return from the spawned function and some other worker which might steal the parent's continuation.  There are three possibilities:

  1. If the parent hasn't been stolen, the spawn helper returns normally.  Since spawns are cheap and steals are supposed to be infrequent, this is expected to be the normal path.
  2. If the parent has been stolen, and there are other outstanding children of the parent function (other spawned functions before the sync, or the continuation hasn't finished) then the worker will store the results cleanly and go off to find other work to steal.
  3. If the parent has been stolen and there are no other outstanding children of the parent function, then this worker will jump to the code after the sync and continue executing.

    - Barry

0 Kudos
Jim_S_Intel
Employee
412 Views
Just wanted to paste in the comment that is in the runtime source, for anyone else who might read this post online and be potentially confused by the somewhat mysterious #if / #else. Stricly speaking, only a memory barrier is necessary, not a full atomic operation. But at some point, someone observed that the atomic operation was faster. /* On x86 the __sync_fetch_and_ family includes a full memory barrier. In theory the sequence in the second branch of the #if should be faster, but on most x86 it is not. */ #if defined __i386__ || defined __x86_64__ __sync_fetch_and_and(&sf->flags, ~CILK_FRAME_DETACHED); #else __cilkrts_fence(); /* membar #StoreLoad */ sf->flags &= ~CILK_FRAME_DETACHED; #endif Cheers, Jim
0 Kudos
john_kennedy
Beginner
412 Views

Ah, that makes perfect sense. Thanks. :) 

0 Kudos
Reply