Solved: Serial parallel_for

Paul_Keir · ‎03-29-2010

Hi all,

Is there something like parallel_for, but which will execute in serial? i.e. I'd pass it my Range and Body much like parallel_for, but the execution would be serial.

Regards,
Graham

RafSchietekat · ‎03-29-2010

Using std::for_each would of course require an iterator to be defined for each Range, which seems fairly inconvenient and sometimes decidedly suboptimal vs. preserving a tight coupling between Body and Range, but that would mean that serial_for is about something else.

If it is to exactly measure (or for a specific target avoid) all parallelisation overhead, you can either do "#define parallel_for serial_for" with something like "template void serial_for( const Range& range, const Body& body, const Partitioner& partitioner ) { body(range); }" (and another overload without Partioner), or do something more pedantic and elaborate that doesn't use macros, where you could use one or several Partitioner typedefs, some of which can be to never_partitioner. Is that it?

Is it to exactly measure the parallel overhead of simple_partitioner with a single worker thread but not excluding division overhead? Then it should be easy enough to do something like "template void serial_for( const Range& range, const Body& body, const Partitioner& partitioner ) { if(range.is_divisible) { Range right(range, tbb::split); serial_for(range, body, partitioner); serial_for(right, body, partitioner); } else { body(range); } }" and another overload without Partitioner, or the pedantic equivalent with a special Partitioner type. Am I getting warm yet?

View solution in original post

Denis_Bolshakov · ‎03-29-2010

Please play with grain size, it is the third argument in blocked_range, but in this case, code will be like on this

class Functor
{
public:
void operator()(const tbb::blocked_range& r) const {
for(size_t i = r.begin(), end = r.end(); i != end; ++i) {
//your code
}
}
};

void doParallel(size_t count) {
Functor functor;
tbb::parallel_for(tbb::blocked_range(0, count, count + 1), functor);
}

If the iteration space has more than grainsize iterations, parallel_for splits it into separate subranges that are scheduled separately,
so if we destroy this rule then iteration space should not be splited.

And the second way try to init scheduler by one, find in your code
tbb::task_scheduler_init init;
and change it to
tbb::task_scheduler_init init(1); //it's recommended to use only for debuging goal

Paul_Keir · ‎03-29-2010

Thankyou.

I'd prefer to use existing Ranges, whether blocked_range or not, and also existing Bodies. I also prefer not to change the task_sceduler_init as this is rather laborious.

I'm imagining a serial_for, with the same interface as parallel_for. I'm surprised it's not in there already.

Denis_Bolshakov · ‎03-29-2010

you can use std::for_each
std::for_each(Iterator begin, Iterator end, Functor& f);

RafSchietekat · ‎03-29-2010

Last time I looked an is_divisible() operation is required for all range types, so you could also override that if you don't want to set a grainsize, but otherwiseanever_partitioner seems to make more sense than having serial_for, and maybe it's as simple as duplicating the code for simple_partitioner with should_execute_range() always returning true. I suppose this is for debugging?

(Added) It seems that a templated operation like parallel_for() has to have specialisations for each partitioner type, probably because it cannot have a default type (to be confirmed), so it should be easy to specialise it for never_partitioner using std::for_each, avoiding any difficulties dealing with internal::start_for.

Bartlomiej · ‎03-29-2010

Humm, "serial parellel" sounds a bit contradicting, doesn't it?
What exactly do you want to do?
Why don't you simply iterate through the elements of the range using e. g. a for loop?

e4lam · ‎03-29-2010

You might also want to check out parallel_scan(). It all depends on what you're trying to do.

Paul_Keir · ‎03-29-2010

The std::for_each is moderately tempting, but it's far from an exact match. First of all, even a 2D range has many begin/end pairs (as opposed to one), and secondly, the function via operator()(const blocked_range& r) { ... } will be applied to a range, as opposed to the problem's element type; say a double.

RafSchietekat · ‎03-29-2010

Using std::for_each would of course require an iterator to be defined for each Range, which seems fairly inconvenient and sometimes decidedly suboptimal vs. preserving a tight coupling between Body and Range, but that would mean that serial_for is about something else.

If it is to exactly measure (or for a specific target avoid) all parallelisation overhead, you can either do "#define parallel_for serial_for" with something like "template void serial_for( const Range& range, const Body& body, const Partitioner& partitioner ) { body(range); }" (and another overload without Partioner), or do something more pedantic and elaborate that doesn't use macros, where you could use one or several Partitioner typedefs, some of which can be to never_partitioner. Is that it?

Is it to exactly measure the parallel overhead of simple_partitioner with a single worker thread but not excluding division overhead? Then it should be easy enough to do something like "template void serial_for( const Range& range, const Body& body, const Partitioner& partitioner ) { if(range.is_divisible) { Range right(range, tbb::split); serial_for(range, body, partitioner); serial_for(right, body, partitioner); } else { body(range); } }" and another overload without Partitioner, or the pedantic equivalent with a special Partitioner type. Am I getting warm yet?

Paul_Keir · ‎03-30-2010

You're boiling! (So to speak.) I think you've got it with the code in your second paragraph above.

[cpp]#define parallel_for serial_for
template
void serial_for(const Range& range, const Body& body, const Partitioner& partitioner)
{ body(range); }[/cpp]

If possible, template specialisation on the Partitioner might be nice rather than a macro, but that's not so important.
Thankyou.