Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Bodies that are Being Joined

josh_rychlinski
Beginner
293 Views
I have an array with many elements. Each element is operated on, and then transformed. The end result is a new array of a different size. I am trying to use parallel_reduce. When the join function is called, is it guaranteed that that which is being joined that it is a tree structure? that a left leaf is being joined to a right leaf to form a new leaf? I'm worried about a leaf with, say elements 0-10000 (in a 5000 blocked_range grain size) transformed into a new array being joined with another leaf that has, say elements 110,000-120,000 (again, transformed into that new array) ?
Would a parallel_scan perhaps be more useful?
0 Kudos
2 Replies
robert-reed
Valued Contributor II
293 Views
I think the answer to your question is contained within this statement regarding parallel_reduce: "The reduction operation should be associative, but does not have to be commutative." That is, (A ? B) ? C must equal A ? (B ? C) for reduction operation ? (whatever it is), but you don't have to worry whether A ? B is equal to B ?A. Commutation will never be a requirementbecause the order of elements and their compositions are never changed.
0 Kudos
RafSchietekat
Valued Contributor III
293 Views
From the reference manual: "The reduction operation should be associative, but does not have to be commutative." That implies that parallel_reduce suits your purpose concerning consecutiveness.

However, because you say the output is an array with a different size, parallel_scan might be more useful if you can have it compute output array indexes in a pre-scan (this computation actually needs to be done in both scans, and sometimes there won't even be a pre-scan), and compute output array contents in the final scan. If you know a safe and manageable upper limit of the destination array's size, an actual array should be suitable, otherwise you'll need a tbb::concurrent_vector, which is slower, and you would do well to use its iterator type's ++ and -- operators as much as possible in preference to random access (the iterator can be copied).

(Added) No, I didn't edit this, that's just your imagination (not a very safe thing in this particular context, BTW)!
0 Kudos
Reply