I observed in your document that the Continuation Tasks is used to minimizing stack use. You mean the parent Task would be store in stack as it's just like a recursive call, but the Continuation Tasks would not.
I think the Continuation Tasks is also have to wait for it's children's completion. Where are the Continuation Tasks stored?
And in your Fibonacci example code, the parent task's ref_count is 3, but Continuation Task's ref_count is 2. Why?
Parent task's execute() method remains on stack while calling spawn_and_wait_for_all(), which in turn calls execute() method for spawned child, which again calls wait_for_all(), etc. Yes, just like usual recursion.
Continuation task is stored in memory. As soon as all its children complete, it is spawned for execution.
The parent task's ref_count is 3 (or, generally speaking, one more than the number of children) to prevent repeated execution and destruction of the task while its execute() method is still on stack and should resume execution. There is no such problem for continuation task.