cilk stack

yanyh · ‎08-22-2012

Hi I am trying to understand how the stack in cilk are managed, reading the ABI doc and the code did not help me too much.

1. For a spawn, is the spawned function (task) to be executed on the current stack (of the spawnning function), or to be on a new stack. If one a new stack, is the new stack allocated by the runtime?
2. if a thief worker steals the continuation of a spawn, is the continuation on a new stack?
So my understanding is that if the spawned function to be on the currrent stack, then the continuation has to be on a new stack; if the spawned function to be on the new stack, and then the continuation could be on the current stack (of the spawnning function). For performance reason and work-first policy, it looks to me that the spawned function will be on the current stack, and only when there is steal, the continuation will be started on the new stack which could be allocated by the thief. If this is correct, how the stack is duplicated to allow the continuation to run?

I hope my questions are not too far from the baseline of the cilk runtime. The similar questions could also be asked for sync.

Thanks
Yonghong

Barry_T_Intel · ‎08-22-2012

1) The spawned function is always executed on the current stack.

2) A stolen continuation is executed on a new stack. The thief allocates a new stack before attempting to steal. Any overhead to allocate that stack is attributed to the steal.

The stack isn't duplicated. When a steal occurs, we "split" the frame. All spawning functions are compiled so that any locals are referenced using the frame pointer. When we resume the continuation, the frame pointer is to the original stack, and the stack pointer is in the new stack. So any local variables used in the function are in the old stack. The Cilk runtime guarantees that only one worker is executing any portion of a function at a time, so this all works. However, if you pass the address of a local variable into a spawned function, you may have a race accessing the data in the continuation. But that's true of any threading system.

When you call another function, the standard call protocol will save the frame pointer, allocate space on the stack, and move the frame pointer onto this stack. So execution from this point forward continues as usual.

On a sync, the runtime guarantees that the code executes on the "left-most" stack. That is, the oldest stack used in this function. This guarantees that you will always exit a spawning function on the same stack you entered on.

- Barry

Barry_T_Intel · ‎08-22-2012

P.S. I just checked the ABI document to add this information if it was missing, and it's discussed (briefly) in chapter 5: "General concepts and code generation":

"All spawning functions require separate stack and frame pointers. Incoming arguments and local variables must be accessed using the frame pointer. Only outgoing arguments should be on the stack. The stack pointer may change unpredictably after spawn. Specifically, when a function is stolen the continuation runs on a new stack. The correct stack pointer, the same as in the serial code, will be restored after sync. The runtime tracks stack pointer changes within a function for whatever stack they are on."

- Barry

yanyh · ‎08-22-2012

Hi Barry, thanks for your information and that is very helpful.

For the frame-split approach, any variables declared before a spawn will be accessed through frame pointer, and those variables declared after a spawn will be through the stack pointer to the new stack allocated for the continuation if it is stolen, assuming a simple cilk function that has only one spawn call. so register is not used for local variables (even if it is declared using register type)? correct me if my understanding is wrong. i saw the use of setjmp to store context, that will be mainly used to store the frame and stack pointer if other registers are not used.

also when you allocate stack, how do you decide the size, through compiler analysis to get those information, i believe?

btw, i did read the sentences you put in the ABI doc, so it was there already. but at that time, i didnot quite get it.

Thanks
Yonghong

Barry_T_Intel · ‎08-22-2012

If you dig into the details, there's a lot to get your mind around. :o)

There's nothing to keep the compiler from allocating all local variable space in the function's prolog. I believe that's generally what the Intel compiler does. I'm not sure about GCC. But I believe it's a requirement of the Win64 ABI.

The jump buffer is OS-dependent. setjmp() and the compiler need to agree what registers are preserved so the compiler knows what it can depend on after a call to setjmp().

The size of the allocated stack is a constant. On Linux, it's 1 MB by default, but it can be set by the user using __cilkrts_set_param() before the runtime is initialized. All of the stacks we allocate will be the same size.

- Barry

yanyh · ‎08-22-2012

Hi Barry,

thanks again for the information, yes, getting my mind obsessed now.

If accessing all the local variables from the frame of a spawn function (which point to the old stack), you need to preallocate all before calling the spawn function, even for those declaraed after a spawn call. If so, that will raise the issue of handling array declaration in a spawning function, e.g. variant-size array.

Or in the "split" frame approach, only those variables declared before a spawn call will be from the frame that points to the old stack, and the variables after the spawn will be from the new stack. In this cases, the adjustment of base pointer of variable references must be carefully handled for those before the spawn call and after the spawn call, not sure how this is handled.

any information on how this is handled?

Thanks
Yonghong

Barry_T_Intel · ‎08-23-2012

That's an interesting question, so I wrote a small program to test it, and discovered that in ICL a variable length array declared in the continuation is being allocated on the continuation's stack. Which is fine, until I accessed it after the sync. The runtime will have returned that stack to our pool, and some worker may start executing on it at any time. I assume that ICC is doing the same thing.

I have put this out to the Cilk developers for discussion. At least for now, you should avoid using variable length arrays on the stack after the first spawn. And I'd assume you should also avoid using alloca(), too.

Don't do this. It will hurt.

Statically sized arrays and scalars are fine. The compiler is gathering them up and allocating space on the stack for them in the function's prolog.

- Barry