- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I'm just beginning to use lambda expressions with TBB and I was wondering if it is possible to use a lambda expression with a parallel_scan to compute a prefix sum. Does anyone have any working examples or ideas and docs on using lambdas not just with parallel_for ?
Thank you.
Thank you.
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Another question on the topic of parallel_scan: is it possible to invert the direction in which the range is scanned? A concrete example is doing a suffix sum instead of the prefix sum.
I don't want to reverse the array, prefix sum and reverse it again as it's pretty expensive
I don't want to reverse the array, prefix sum and reverse it again as it's pretty expensive
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
...as it's pretty expensive...
How big is your array?
Also,
- you could keep the original array intact;
- create a copy of it;
- invert it;
- calculate prefix sums;
- release the copy.
Would it work for you?
Best regards,
Sergey
How big is your array?
Also,
- you could keep the original array intact;
- create a copy of it;
- invert it;
- calculate prefix sums;
- release the copy.
Would it work for you?
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#0 "Hello, I'm just beginning to use lambda expressions with TBB and I was wondering if it is possible to use a lambda expression with a parallel_scan to compute a prefix sum. Does anyone have any working examples or ideas and docs on using lambdas not just with parallel_for ?"
A C++ lambda expression cannot specify all the features required by a tbb::parallel_scan Body, and its temporary-object result cannot be bound to a non-const reference, so no.
#1 "Another question on the topic of parallel_scan: is it possible to invert the direction in which the range is scanned? A concrete example is doing a suffix sum instead of the prefix sum."
The simplest solution seems to be to use range element i as array index array_size-1-i.
A C++ lambda expression cannot specify all the features required by a tbb::parallel_scan Body, and its temporary-object result cannot be bound to a non-const reference, so no.
#1 "Another question on the topic of parallel_scan: is it possible to invert the direction in which the range is scanned? A concrete example is doing a suffix sum instead of the prefix sum."
The simplest solution seems to be to use range element i as array index array_size-1-i.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I corrected resp. simplified the 2 parts of #3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did see the original version, but it's alright, I've begun creating a personal parallel_scan so that it fits my needs. I remember trying something very similar with what you're describing (regarding the suffix sum) but it didn't work. What I would need is something that modifies the behaviour of the scan by storing in the leftmost part of the segment instead of the rightmost (and analogous in the second phase). I saw there's something in the parallel_scan.h regarding "is_left_final" or something like that but I couldn't get my head round it. I'll just stick with using parallel_for for now, sadly...
I don't know why intel would create such templates and not provide extensive documentation and wider-ranging examples. For example I also tried to find a working example of the running maximum/minimum of an array. I only came across a frustratred blog post about how parallel_scan isn't doing what it's supposed to. I thought that language features are added to ease the process of coding, not make it harder.
I don't know why intel would create such templates and not provide extensive documentation and wider-ranging examples. For example I also tried to find a working example of the running maximum/minimum of an array. I only came across a frustratred blog post about how parallel_scan isn't doing what it's supposed to. I thought that language features are added to ease the process of coding, not make it harder.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"I did see the original version, but it's alright, I've begun creating a personal parallel_scan so that it fits my needs."
Would you care to hint at what it looks like? A C++ lambda expression is (far) more limited than, say, a Java anonymous inner class, so it cannot assemble all the features in one go (if I overlooked anything, I'd be eager to learn about it). However, a template function perhaps could assemble it internally from several lambda-expression arguments and then forward that to parallel_scan, but I didn't try that yet, myself. Is that what you did? And how does it stand up against a traditional independent Body in ease of use?
"I remember trying something very similar with what you're describing (regarding the suffix sum) but it didn't work. What I would need is something that modifies the behaviour of the scan by storing in the leftmost part of the segment instead of the rightmost (and analogous in the second phase). I saw there's something in the parallel_scan.h regarding "is_left_final" or something like that but I couldn't get my head round it. I'll just stick with using parallel_for for now, sadly..."
I don't see how simply remapping both input and output wouldn't be equivalent to applying parallel_scan() to an inverted input and then inverting the result, except for better performance of the former because it is so much faster to do a small computation than to copy things around? Perhaps you overlooked a detail like the inverted role of begin() and end() relative to both domains? Note that left_is_final is relevant internally to avoid redundant final scans, but it's determined by the algorithm's direction instead of the other way around, so it could as well be called initial_is_final or somesuch instead.
"I don't know why intel would create such templates and not provide extensive documentation and wider-ranging examples."
Perhaps indeed TBB could do with extra effort to create a cookbook of examples that would seem superfluous to a team of dedicated experts.
"For example I also tried to find a working example of the running maximum/minimum of an array. I only came across a frustratred blog post about how parallel_scan isn't doing what it's supposed to. I thought that language features are added to ease the process of coding, not make it harder."
Do you mean "Problems using parallel_scan"? Except for the absence of final feedback from the original poster, I think this forum thread was very useful, and even triggered a revision of the documentation.
Would you care to hint at what it looks like? A C++ lambda expression is (far) more limited than, say, a Java anonymous inner class, so it cannot assemble all the features in one go (if I overlooked anything, I'd be eager to learn about it). However, a template function perhaps could assemble it internally from several lambda-expression arguments and then forward that to parallel_scan, but I didn't try that yet, myself. Is that what you did? And how does it stand up against a traditional independent Body in ease of use?
"I remember trying something very similar with what you're describing (regarding the suffix sum) but it didn't work. What I would need is something that modifies the behaviour of the scan by storing in the leftmost part of the segment instead of the rightmost (and analogous in the second phase). I saw there's something in the parallel_scan.h regarding "is_left_final" or something like that but I couldn't get my head round it. I'll just stick with using parallel_for for now, sadly..."
I don't see how simply remapping both input and output wouldn't be equivalent to applying parallel_scan() to an inverted input and then inverting the result, except for better performance of the former because it is so much faster to do a small computation than to copy things around? Perhaps you overlooked a detail like the inverted role of begin() and end() relative to both domains? Note that left_is_final is relevant internally to avoid redundant final scans, but it's determined by the algorithm's direction instead of the other way around, so it could as well be called initial_is_final or somesuch instead.
"I don't know why intel would create such templates and not provide extensive documentation and wider-ranging examples."
Perhaps indeed TBB could do with extra effort to create a cookbook of examples that would seem superfluous to a team of dedicated experts.
"For example I also tried to find a working example of the running maximum/minimum of an array. I only came across a frustratred blog post about how parallel_scan isn't doing what it's supposed to. I thought that language features are added to ease the process of coding, not make it harder."
Do you mean "Problems using parallel_scan"? Except for the absence of final feedback from the original poster, I think this forum thread was very useful, and even triggered a revision of the documentation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
...the running maximum/minimum of an array...
I havetwo questions:
- Do you want to get Min and Max values of an arrayin one pass, right?
- How many threadsare going to be used?
I havetwo questions:
- Do you want to get Min and Max values of an arrayin one pass, right?
- How many threadsare going to be used?
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page