Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

A simple parallel_for doesn't work!

sinedie
Beginner
405 Views
I tried making a simple parallel_for based program looking at various examples. It does not do what it is supposed to, and I can't see any bug either. :( Help please!

The program is attached as it was not being shown properly upon cut-pasting. The parallel part of the code is getting lost, not sure why! Probably some problem with the sytax highlighter...

TIA,
-S


0 Kudos
4 Replies
RafSchietekat
Valued Contributor III
405 Views

Some miscellaneous remarks/questions... In func() and elsewhere, shouldn't 2.0 be 2.0f instead to avoid conversion overhead? In parallel_func::operator(), shouldn't blocked_range::end() be taken out of the loop to allow the compiler to optimise it? Don't give task_scheduler_init aliteral argument, use the appropriate constant task_scheduler_init::automatic (or simply omit it because it is the default value).

As for why this doesn't work: "parallel_func( a, b, n );" simply constructs and immediately destroys a temporary object (didn't the compiler have anything to say about that?); you should instead provide it as an argument to parallel_for.

0 Kudos
sinedie
Beginner
405 Views
Quoting - Raf Schietekat

Some miscellaneous remarks/questions... In func() and elsewhere, shouldn't 2.0 be 2.0f instead to avoid conversion overhead? In parallel_func::operator(), shouldn't blocked_range::end() be taken out of the loop to allow the compiler to optimise it? Don't give task_scheduler_init aliteral argument, use the appropriate constant task_scheduler_init::automatic (or simply omit it because it is the default value).

As for why this doesn't work: "parallel_func( a, b, n );" simply constructs and immediately destroys a temporary object (didn't the compiler have anything to say about that?); you should instead provide it as an argument to parallel_for.


Dear Raf,
1. I was not aware of float conversion overhead. Thanks!

2. I haven't been able to follow the moving of blocked_range::end() part. :( Could you illustrate the corrected version please?

3. Point related to task_scheduler() is well taken. I thought -1 was for automatic..., I read something like that long back. It seems I am wrong. Thanks again.

4. Reg parallel_func(a,b,n) being useless: I have been trying to implement an example given in Intel TBB by James Rainders (p. 33-34). Probably I misunderstood something, but I can't see exactly what. I did get parallel_for working in the main() but I'd prefer to put it into a wrapper class as I tried in this case.

I am still grappling with my first proper program and I believe unless I make something simple as above work I am stuck. If it is not asking too much of trouble, could you fix the problem in a consistent way with the Rainder's illustration? It will be very helpful as the complete example is not given but only the class and probably the wrapper function to parallel execution.

Thanks again for the help and insight.
-S





0 Kudos
RafSchietekat
Valued Contributor III
405 Views
"1. I was not aware of float conversion overhead. Thanks!"
Actually I was wondering myself what exactly will happen here.

"2. I haven't been able to follow the moving of blocked_range::end() part. :( Could you illustrate the corrected version please?"
The compiler may not be able to guess that end() returns the same value each time, so you may miss out on some optimisation trick it has up its sleeve. For performance-sensitive pieces of code I would assign the value to a local variable instead (although elsewhere I would consider that just visual clutter).

"3. Point related to task_scheduler() is well taken. I thought -1 was for automatic..., I read something like that long back. It seems I am wrong. Thanks again."
The value happens to be correct, and there's no reason it will change, but you're not supposed to know that. Well, when using a debugger you need to know, but in a program you should use the constant instead.

"4. Reg parallel_func(a,b,n) being useless: I have been trying to implement an example given in Intel TBB by James Rainders (p. 33-34). Probably I misunderstood something, but I can't see exactly what. I did get parallel_for working in the main() but I'd prefer to put it into a wrapper class as I tried in this case."
Well, I can't find it, so the compiler won't either. :-)

I'm sure you can figure it out yourself, like you did in "parallel_for with inexplicable performance".
0 Kudos
Denis_Bolshakov
Beginner
405 Views

As I see, you don't use parallel for.

I would like to modify your class and add a parallel version function
[cpp]class parallel_func {
	private:
		float *a;
		float *b;
	public:
		parallel_func( float *x_a, float *x_b)
			: a(x_a), b(x_b)
		{
		}
		void operator() (const blocked_range &r) const {
			float *x_a = a;
			float *x_b = b;

                        for(size_t i=r.begin(); i!=r.end(); i++) {
				func( &a, &b);
                        }
                }
};

void parallel_version(float *a, float *b, size_t n )
{
    tbb::parallel_for(tbb::blocked_range(0, n), parallel_func(a ,b), tbb::auto_partitioner());
}
[/cpp]
0 Kudos
Reply