Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
24 Views

Intel TBB and Auto Vectorization

Is there any way to use TBB and autovectorization?

I have this, where kIterSize is abut 512k.
parallel_for(blocked_range(0, kIterSize), parallel_task(), auto_partitioner());

I then do this inside my thread function
for(i = r.begin(); i < r.end(); i++)
BufB += BufA / 2.0f;

This is not autovectorized, so I am taking a huge performance hit.

I want SSE4 and TBB running on all processors.
0 Kudos
5 Replies
Highlighted
24 Views

Try assigning r.end() to a local variable and use that in loop condition; that should help vectorizer to recognize the loop as suitable.
You might look at this my postfor somewhat related investigation.
0 Kudos
Highlighted
Beginner
24 Views

Try assigning r.end() to a local variable and use that in loop condition; that should help vectorizer to recognize the loop as suitable.
You might look at this my postfor somewhat related investigation.

Thank you for the response. Your blog is very helpful. However, the local variable assignment did not help.

The compiler is telling me it did not autovectorize because
"parallel_for.h(89): (col. 20) remark: loop was not vectorized: unsupported loop structure.".

This line contains:
if( !my_range.is_divisible() || my_partition.should_execute_range(*this) ) {
my_body( my_range );
return my_partition.continue_after_execute_range(*this);
}


When I replaced r.end() with a local variable it gives these reasons:

parallel_for.h(89): (col. 20) remark: loop was not vectorized: existence of vector dependence.
parallel_for.h(89): (col. 20) remark: vector dependence: assumed FLOW dependence between (unknown) line 89 and this line 89.
parallel_for.h(89): (col. 20) remark: vector dependence: assumed ANTI dependence between this line 89 and (unknown) line 89.
0 Kudos
Highlighted
24 Views

As you may have noticed, the line 89 is where the call to operator() for the body object is performed. I have no idea why the compiler pointed to this line; my best guess is that the remarks really apply to the actual loop over the blocked_range in your code.

If that loop consists of exactly one line as you wrote in the first post, then probably the compiler conservatively assumes the arrays you operate with could overlap (i.e. it can not prove those do not overlap). Assuming you use Intel Compiler, I suggest you to look at the documentation about vectorization. Let me quote just one sentence that might be relevant:

"For example, a common problem with global pointers is that they often prevent the compiler from being able to prove that two memory references refer to distinct locations. Consequently, this prevents certain reordering transformations."

In particular, #pragma ivdep might be used to tell the compiler to forget about assumed dependencies ifyou know for sure those are imaginary.
0 Kudos
Highlighted
Valued Contributor I
24 Views

Quoting - Poiuyt
parallel_for.h(89): (col. 20) remark: loop was not vectorized: existence of vector dependence.
parallel_for.h(89): (col. 20) remark: vector dependence: assumed FLOW dependence between (unknown) line 89 and this line 89.
parallel_for.h(89): (col. 20) remark: vector dependence: assumed ANTI dependence between this line 89 and (unknown) line 89.

You may try to use __restrict/restrict keyword if supported by compiler, i.e.:

float* __restrict B = BufB;
float* __restrict A = BufA;
int end = r.end();
for(i = r.begin(); i < end; i++)
B += A / 2.0f;

This will communicate to the compiler that BufA and BufB are not overlapping so no dependencies.
0 Kudos
Highlighted
Beginner
24 Views

Quoting - Dmitriy Vyukov

You may try to use __restrict/restrict keyword if supported by compiler, i.e.:

float* __restrict B = BufB;
float* __restrict A = BufA;
int end = r.end();
for(i = r.begin(); i < end; i++)
B += A / 2.0f;

This will communicate to the compiler that BufA and BufB are not overlapping so no dependencies.

I am using the latest version of the Intel Compiler.

The__restrict keywordworked and I got my performance back. Thanks!
0 Kudos