I find tbb::parallel_while very useful for my applications. However some experiments showed poor scalability due to a small iteration task weight as comapared with parallel_while and task stealing overheads. The scalability was improved after I enlarged the grain by grouping several iteration items into one item.
I think it could be done more efficiently in parallel_while implementation rather then on client's side. It would be reather useful for users to have a grain parameter (as in parallel_for) to tune their applications as in practice processing the stream itemcan be too light.
What do you think of it?
I think it is a good idea deserving further investigation. It may improve the scalability when processing a single item costs reasonably more that fetching and bundling it, but still not enough to make processing it separately efficient. And I've got a feeling that there should be enough real-world applications that may benefit from it. Actually I also thought about adding grain size to parallel_do some time ago, but since we have an internal rule of starting implementing a feature only after we have interested users (and there was other stuff to do (as it often happens)) we never even discussed it (as far as I can rememeber). Now let's see what Arch says.