I find tbb::parallel_while very useful for my applications. However some experiments showed poor scalability due to a small iteration task weight as comapared with parallel_while and task stealing overheads. The scalability was improved after I enlarged the grain by grouping several iteration items into one item.
I think it could be done more efficiently in parallel_while implementation rather then on client's side. It would be reather useful for users to have a grain parameter (as in parallel_for) to tune their applications as in practice processing the stream itemcan be too light.
What do you think of it?