My blog http://softwareblogs.intel.com/2007/11/08/have-a-fish-how-break-from-a-parallel-loop-in-tbb/explains a solution that keep most of the unmapped tasks from being created at all once a parallel loop is cancelled.
In the code Arch shared, my_stop is declared volatile and passed by reference to every parallel loop body that uses that cancelable_range. That means every call to r.end() does a read request on the cache line containing my_stop. This is not a problem if no thread is writing to that cache line in the course of the loop. Each HW thread would get a private, read-access copy of the cache line and proceed without thrashing until loop termination or until some thread writes my_stop, all the other copies get invalidated and the new cache line gets propagated.
If some thread was regularly writing to that cache line, all other threads' copies would be invalidated regularly, forcing extra bus activity to reread the line even though the value of my_stop has not changed. This syndrome of unnecessary evictions because of cache line coincidenceis known as false sharing.