If a task_handle object is passed as a parameter to run, the caller is responsible for managing the lifetime of the task_handle object.See also Remarks on the MSDN page for task_handle. I would assume that for structured_task_group this behavior was chosen for performance, and for task_group it was chosen for consistency with structured_task_group. But, this is better to ask at Microsoft's forums.
Right, only PPL compatible parallel_for uses a single body shared via references while the original tbb::parallel_for creates copies.
A while back, I did an experiment. I thought that sharing a single body instance could be faster than copying. To my surprise, for a very simple body class the variant with body sharing by reference was slower. I do not remember now whether the body object captured anything; possibly the compiler was able to eliminate copying. It's also possible that when the body object is local to a task compilers can enable more optimizations, while being more conservative when the body is shared - Raf are you asking about that? Since I am not a compiler writer, I can't say for sure, but it sounds plausible to me.
Possible positive sides of sharing I see: absence of the requirement for copy construction (though it's rarely a problem), and potential performance win in caseenough external state is captured in the body to make copying an extensive operation.