- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The body objectof parallel_for was intended to be a closure that captures the context in which a parallel loop should execute, to use itinside the loop. I.e. think of the body as of an instance of a lambda function in C++0x. Or, as of a function that contains the whole scope of a single loop iteration, and obtains all necessary data via parameters.
If body objects are big to worry about copying, maybe it's time to rethink the design. Would you pass that many data/context as parameters into a function? If not, don't do it for parallel_for body as well.
You can also use parallel_reduce which performs lazy copying of bodies. For parallel_reduce, this is important because its body also serves as an accumulator of partial "sums". But if necessary it can be used without doing any reduction: implement method join() so that it does nothing.
If body objects are big to worry about copying, maybe it's time to rethink the design. Would you pass that many data/context as parameters into a function? If not, don't do it for parallel_for body as well.
You can also use parallel_reduce which performs lazy copying of bodies. For parallel_reduce, this is important because its body also serves as an accumulator of partial "sums". But if necessary it can be used without doing any reduction: implement method join() so that it does nothing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Argument passing by value or by reference also applies to a Body, except that you can think of it as many function calls when making the trade-off: just pass a pointer to that big lump of data, assuming it is thread-safe of course.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Many thanks to both of you for replying. I think I have not stressed enough that the overhead I am talking about would come from copying code and not data. Parameter passing is not an issue for me, since I assume that most data processed by the loop can be heap-allocated global objects.
A typical scenario would be e.g. applications where code comprises of large for-loops parallelized at the outermost level, while data may be totally decoupled from code and defined elsewhere. This is a common case in large-scale, array-based scientific codes. In that case, parallel_for would only pass the subrange bounds in each operator() invocation, but would need of course to copy the operator() code every time. [Apart from copying the same code again and again, this could have other side effects such as thrashing instruction cache]. Of course, an effective alternative would be to enclose the whole loop body in a separate global function and having it called from inside operator(), but I wonder whether it would be a better approach for TBB to provide some way to decouple code from task state.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unless I'm missing your point, template instantiation is static, while Body instantiation is dynamic: the code is not copied when the Body is copied, only its state is.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK, now it's more clear. Thanks a lot!

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page