- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was reading A guide to vectorization with Intel C++ compilers: https://software.intel.com/sites/default/files/8c/a9/CompilerAutovectorizationGuide.pdf
I am referring to Single Entry and Single Exit Criteria Page No 8. I have specified two options a) Break b) Continue
A) Break
void no_vec(float a[], float b[], float c[])
{
int i = 0;
while(i < 100)
{
a = b * c;
if(a < 0.0)
break;
++i;
}
}
===========================================================================
Begin optimization report for: no_vec(float *, float *, float *)
Report from: Vector optimizations [vec]
LOOP BEGIN at breaktest.c(6,2)
remark #15520: loop was not vectorized: loop with early exits cannot be vectorized unless it meets search loop idiom criteria
LOOP END
===========================================================================
B) Continue
void no_vec(float a[], float b[], float c[])
{
int i = 0;
while(i < 100)
{
a = b * c;
if(a < 0.0)
continue;
++i;
}
}
===========================================================================
Begin optimization report for: no_vec(float *, float *, float *)
Report from: Vector optimizations [vec]
Non-optimizable loops:
LOOP BEGIN at continuetest.c(6,2)
remark #15523: loop was not vectorized: cannot compute loop iteration count before executing the loop.
LOOP END
===========================================================================
My Questions :
1) What difference continue and break makes for the optimizers to change the remark in optrpt
2) Is there any way to vectorize the loop, although it is necessary for loop to have data-dependent continue condition.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did you try something like
#pragma simd firstprivate(i) lastprivate(i)
for(i=0; i<100; ++i) if((a=b*c) <0)break;
with a compiler issued during the last year? If you are trying to find the limits of that "search loop idiom," you shouldn't get too fancy.
Your second case looks like it doesn't terminate, but vectorization does require a counted loop, even with the recent dispensation to permit early exit for a "search loop."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In example "A" if statement with break which can result in early exit from the loop prevents vectorization.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The loops as stated, while not impossible, is hard to vectorize.
a = b * c;
When vecorized, on 4-wide vector, can be thought of as equivalent to:
a = b * c; a[i+1] = b[i+1] * c[i+1]; a[i+2] = b[i+2] * c[i+2]; a[i+3] = b[i+3] * c[i+3];
All done in parallel, however your break or continue, stops (not continues) on the first occurrence of the condition when read left to right.
Should index satisfy the condition the remainder of the vector is not to be (at least) stored in a[i+1] , a[i+2], a[i+3].
Inserting code, to provide vectorization .and. (visible to the program) perform only the operations specified in source, would have to perform something like (pseudo code):
temp[0] = b * c; temp[1] = b[i+1] * c[i+1]; temp[2] = b[i+2] * c[i+2]; temp[3] = b[i+3] * c[i+3];
mask[0] = (temp[0] < 0.0); mask[1] = (temp[1] < 0.0); mask[2] = (temp[2] < 0.0); mask[3] = (temp[3] < 0.0);
use vtestps (or vtestpd) to test all mask lanes for 0 and if true
a = temp[0]; a[i+1] = temp[1]; a[i+2] = temp[2]; a[i+3] = temp[3];
else
for(j=0; j < 4; ++j) {
a[i+j] = temp
if(temp
i = i + j;
(exit outer loop)
}
endif
On long runs, the above vectorized code should run faster. Short runs, it would be slower.
Currently the compiler optimization engineers haven't picked up this "high hanging" fruit.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So in order to vectorize that loop compiler should create a temporary vector(XMM or YMM register) loaded with zeroes and insert code for floating point comparison with float a[].

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page