I would be inclined to try a solution based on polling, like this:
Define a global atomc counter, say as "tbb::atomic
You could also consider using task_group_context objects and the form of task::allocate_root that takes a task_group_context argument. Then you can cancel all tasks related to that task_group_context. Though if the tasks might run for a long time, you will need to do polling anyway via task::is_cancelled(), and freeing the task_group_context object at the right time might be tricky. The counter solution avoids the freeing hassle.
With regard to the parallelization strategy, it may work well or it may run into problems because of the memory bandwidth expense of combining layers. An alternative would be to partition the image into sections and have a task per section.I don't know enough specifics about your situation to recommend which is best.