Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Lambda and scan

sinaelgl
Beginner
776 Views
Hello, I'm just beginning to use lambda expressions with TBB and I was wondering if it is possible to use a lambda expression with a parallel_scan to compute a prefix sum. Does anyone have any working examples or ideas and docs on using lambdas not just with parallel_for ?

Thank you.
0 Kudos
7 Replies
sinaelgl
Beginner
776 Views
Another question on the topic of parallel_scan: is it possible to invert the direction in which the range is scanned? A concrete example is doing a suffix sum instead of the prefix sum.
I don't want to reverse the array, prefix sum and reverse it again as it's pretty expensive
0 Kudos
SergeyKostrov
Valued Contributor II
776 Views
...as it's pretty expensive...

How big is your array?

Also,
- you could keep the original array intact;
- create a copy of it;
- invert it;
- calculate prefix sums;
- release the copy.

Would it work for you?

Best regards,
Sergey
0 Kudos
RafSchietekat
Valued Contributor III
776 Views
#0 "Hello, I'm just beginning to use lambda expressions with TBB and I was wondering if it is possible to use a lambda expression with a parallel_scan to compute a prefix sum. Does anyone have any working examples or ideas and docs on using lambdas not just with parallel_for ?"
A C++ lambda expression cannot specify all the features required by a tbb::parallel_scan Body, and its temporary-object result cannot be bound to a non-const reference, so no.

#1 "Another question on the topic of parallel_scan: is it possible to invert the direction in which the range is scanned? A concrete example is doing a suffix sum instead of the prefix sum."
The simplest solution seems to be to use range element i as array index array_size-1-i.
0 Kudos
RafSchietekat
Valued Contributor III
776 Views
I corrected resp. simplified the 2 parts of #3.
0 Kudos
sinaelgl
Beginner
776 Views
I did see the original version, but it's alright, I've begun creating a personal parallel_scan so that it fits my needs. I remember trying something very similar with what you're describing (regarding the suffix sum) but it didn't work. What I would need is something that modifies the behaviour of the scan by storing in the leftmost part of the segment instead of the rightmost (and analogous in the second phase). I saw there's something in the parallel_scan.h regarding "is_left_final" or something like that but I couldn't get my head round it. I'll just stick with using parallel_for for now, sadly...

I don't know why intel would create such templates and not provide extensive documentation and wider-ranging examples. For example I also tried to find a working example of the running maximum/minimum of an array. I only came across a frustratred blog post about how parallel_scan isn't doing what it's supposed to. I thought that language features are added to ease the process of coding, not make it harder.
0 Kudos
RafSchietekat
Valued Contributor III
776 Views
"I did see the original version, but it's alright, I've begun creating a personal parallel_scan so that it fits my needs."
Would you care to hint at what it looks like? A C++ lambda expression is (far) more limited than, say, a Java anonymous inner class, so it cannot assemble all the features in one go (if I overlooked anything, I'd be eager to learn about it). However, a template function perhaps could assemble it internally from several lambda-expression arguments and then forward that to parallel_scan, but I didn't try that yet, myself. Is that what you did? And how does it stand up against a traditional independent Body in ease of use?

"I remember trying something very similar with what you're describing (regarding the suffix sum) but it didn't work. What I would need is something that modifies the behaviour of the scan by storing in the leftmost part of the segment instead of the rightmost (and analogous in the second phase). I saw there's something in the parallel_scan.h regarding "is_left_final" or something like that but I couldn't get my head round it. I'll just stick with using parallel_for for now, sadly..."
I don't see how simply remapping both input and output wouldn't be equivalent to applying parallel_scan() to an inverted input and then inverting the result, except for better performance of the former because it is so much faster to do a small computation than to copy things around? Perhaps you overlooked a detail like the inverted role of begin() and end() relative to both domains? Note that left_is_final is relevant internally to avoid redundant final scans, but it's determined by the algorithm's direction instead of the other way around, so it could as well be called initial_is_final or somesuch instead.

"I don't know why intel would create such templates and not provide extensive documentation and wider-ranging examples."
Perhaps indeed TBB could do with extra effort to create a cookbook of examples that would seem superfluous to a team of dedicated experts.

"For example I also tried to find a working example of the running maximum/minimum of an array. I only came across a frustratred blog post about how parallel_scan isn't doing what it's supposed to. I thought that language features are added to ease the process of coding, not make it harder."
Do you mean "Problems using parallel_scan"? Except for the absence of final feedback from the original poster, I think this forum thread was very useful, and even triggered a revision of the documentation.
0 Kudos
SergeyKostrov
Valued Contributor II
776 Views
...the running maximum/minimum of an array...

I havetwo questions:

- Do you want to get Min and Max values of an arrayin one pass, right?
- How many threadsare going to be used?
0 Kudos
Reply