- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
Is there something like parallel_for, but which will execute in serial? i.e. I'd pass it my Range and Body much like parallel_for, but the execution would be serial.
Regards,
Graham
Is there something like parallel_for, but which will execute in serial? i.e. I'd pass it my Range and Body much like parallel_for, but the execution would be serial.
Regards,
Graham
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using std::for_each would of course require an iterator to be defined for each Range, which seems fairly inconvenient and sometimes decidedly suboptimal vs. preserving a tight coupling between Body and Range, but that would mean that serial_for is about something else.
If it is to exactly measure (or for a specific target avoid) all parallelisation overhead, you can either do "#define parallel_for serial_for" with something like "template void serial_for( const Range& range, const Body& body, const Partitioner& partitioner ) { body(range); }" (and another overload without Partioner), or do something more pedantic and elaborate that doesn't use macros, where you could use one or several Partitioner typedefs, some of which can be to never_partitioner. Is that it?
Is it to exactly measure the parallel overhead of simple_partitioner with a single worker thread but not excluding division overhead? Then it should be easy enough to do something like "template void serial_for( const Range& range, const Body& body, const Partitioner& partitioner ) { if(range.is_divisible) { Range right(range, tbb::split); serial_for(range, body, partitioner); serial_for(right, body, partitioner); } else { body(range); } }" and another overload without Partitioner, or the pedantic equivalent with a special Partitioner type. Am I getting warm yet?
If it is to exactly measure (or for a specific target avoid) all parallelisation overhead, you can either do "#define parallel_for serial_for" with something like "template
Is it to exactly measure the parallel overhead of simple_partitioner with a single worker thread but not excluding division overhead? Then it should be easy enough to do something like "template
Link Copied
9 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please play with grain size, it is the third argument in blocked_range, but in this case, code will be like on this
class Functor
{
public:
void operator()(const tbb::blocked_range& r) const {
for(size_t i = r.begin(), end = r.end(); i != end; ++i) {
//your code
}
}
};
void doParallel(size_t count) {
Functor functor;
tbb::parallel_for(tbb::blocked_range(0, count, count + 1), functor);
}
If the iteration space has more than grainsize iterations, parallel_for splits it into separate subranges that are scheduled separately,
so if we destroy this rule then iteration space should not be splited.
And the second way try to init scheduler by one, find in your code
tbb::task_scheduler_init init;
and change it to
tbb::task_scheduler_init init(1); //it's recommended to use only for debuging goal
class Functor
{
public:
void operator()(const tbb::blocked_range
for(size_t i = r.begin(), end = r.end(); i != end; ++i) {
//your code
}
}
};
void doParallel(size_t count) {
Functor functor;
tbb::parallel_for(tbb::blocked_range
}
If the iteration space has more than grainsize iterations, parallel_for splits it into separate subranges that are scheduled separately,
so if we destroy this rule then iteration space should not be splited.
And the second way try to init scheduler by one, find in your code
tbb::task_scheduler_init init;
and change it to
tbb::task_scheduler_init init(1); //it's recommended to use only for debuging goal
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thankyou.
I'd prefer to use existing Ranges, whether blocked_range or not, and also existing Bodies. I also prefer not to change the task_sceduler_init as this is rather laborious.
I'm imagining a serial_for, with the same interface as parallel_for. I'm surprised it's not in there already.
I'd prefer to use existing Ranges, whether blocked_range or not, and also existing Bodies. I also prefer not to change the task_sceduler_init as this is rather laborious.
I'm imagining a serial_for, with the same interface as parallel_for. I'm surprised it's not in there already.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you can use std::for_each
std::for_each(Iterator begin, Iterator end, Functor& f);
std::for_each(Iterator begin, Iterator end, Functor& f);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Last time I looked an is_divisible() operation is required for all range types, so you could also override that if you don't want to set a grainsize, but otherwiseanever_partitioner seems to make more sense than having serial_for, and maybe it's as simple as duplicating the code for simple_partitioner with should_execute_range() always returning true. I suppose this is for debugging?
(Added) It seems that a templated operation like parallel_for() has to have specialisations for each partitioner type, probably because it cannot have a default type (to be confirmed), so it should be easy to specialise it for never_partitioner using std::for_each, avoiding any difficulties dealing with internal::start_for.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Humm, "serial parellel" sounds a bit contradicting, doesn't it?
What exactly do you want to do?
Why don't you simply iterate through the elements of the range using e. g. a for loop?
What exactly do you want to do?
Why don't you simply iterate through the elements of the range using e. g. a for loop?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You might also want to check out parallel_scan(). It all depends on what you're trying to do.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The std::for_each is moderately tempting, but it's far from an exact match. First of all, even a 2D range has many begin/end pairs (as opposed to one), and secondly, the function via operator()(const blocked_range& r) { ... } will be applied to a range, as opposed to the problem's element type; say a double.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using std::for_each would of course require an iterator to be defined for each Range, which seems fairly inconvenient and sometimes decidedly suboptimal vs. preserving a tight coupling between Body and Range, but that would mean that serial_for is about something else.
If it is to exactly measure (or for a specific target avoid) all parallelisation overhead, you can either do "#define parallel_for serial_for" with something like "template void serial_for( const Range& range, const Body& body, const Partitioner& partitioner ) { body(range); }" (and another overload without Partioner), or do something more pedantic and elaborate that doesn't use macros, where you could use one or several Partitioner typedefs, some of which can be to never_partitioner. Is that it?
Is it to exactly measure the parallel overhead of simple_partitioner with a single worker thread but not excluding division overhead? Then it should be easy enough to do something like "template void serial_for( const Range& range, const Body& body, const Partitioner& partitioner ) { if(range.is_divisible) { Range right(range, tbb::split); serial_for(range, body, partitioner); serial_for(right, body, partitioner); } else { body(range); } }" and another overload without Partitioner, or the pedantic equivalent with a special Partitioner type. Am I getting warm yet?
If it is to exactly measure (or for a specific target avoid) all parallelisation overhead, you can either do "#define parallel_for serial_for" with something like "template
Is it to exactly measure the parallel overhead of simple_partitioner with a single worker thread but not excluding division overhead? Then it should be easy enough to do something like "template
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You're boiling! (So to speak.) I think you've got it with the code in your second paragraph above.
Thankyou.
[cpp]#define parallel_for serial_forIf possible, template specialisation on the Partitioner might be nice rather than a macro, but that's not so important.
template
void serial_for(const Range& range, const Body& body, const Partitioner& partitioner)
{ body(range); }[/cpp]
Thankyou.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page