Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Autovectorization Diagnostics

chris
Beginner
574 Views
I have used /Qvec-report:3 option and yet in the output window from the build only the information about the loops in some of the functions is displayed. I am using the compiler 10.0.025 with Microsoft Visual Studio 2005. Why aren't vectorization diagnostics displayed for loops in all the functions. I have a single source file main.cpp and a header that includes some constants.
0 Kudos
6 Replies
TimP
Honored Contributor III
574 Views
The compilers have been known to miss loops during vectorization analysis, and you have chosen a rather old compiler version. If you would show some actual code, we could comment more appropriately.
0 Kudos
chris
Beginner
574 Views
I upgraded to version the C++ compiler 10.1.022 and that solves the problem with seeing the vector diagnostics for non-vectorized loops. Nonetheless, I have some very simple loops and do not understand why the compiler is not applying autovectorization.

for (int k = 0; k < NUMROWS*NUMCOLS; k++)
{
c = max(c, b);
}

for (int k = 0; k < NUMROWS*NUMCOLS; k++)
{
a = b > d;
}
0 Kudos
chris
Beginner
574 Views
I should add that c, b, and d are both short pointers. A is an unsigned char pointer. In both cases, the compiler says the existence of vector dependences prevents the vectorization of the loop. In both cases, it states that there are proven flow and antidependences.
0 Kudos
TimP
Honored Contributor III
574 Views

In a typical context, your pointers may have to be declared short * restrict, with the -restrict compile option set. In addition, the compiler misses some opportunities to vectorizer short, when it does OK with int, but that shouldn't be associated with dependency diagnostic. Also, compilers since10.0 require #pragma ivdep in some situations where 9.1 could vectorize without it.

0 Kudos
chris
Beginner
574 Views
With the #pragma ivdep, #pragma vector always, and the restrict keyword, the loops are still not vectorized with the error message dereference too complex. These loops have one line of code each. Are compilers still at the point where intrinsics or inline assembly are necessary to take advantage of SSE instructions?
0 Kudos
levicki
Valued Contributor I
574 Views

Next time please try to post complete code example, preferrably as attachment. Thank you.

#include 

void test(unsigned char *a, short *b, short *c, short *d, int n)
{
	for (int k = 0; k < n; k++) {
		c = __max(c, b);
	}

	for (int k = 0; k < n; k++) {
		a = b > d;
	}
}

With Intel Compiler 11.0.026 beta the first loop is auto-vectorized. Second loop cannot be auto-vectorized because of unsupported data type. It could be vectorized with very little effort on your side by changing a[] to short * (that is if memory is not a concern), and afterwards the compiler will even fuse the loops.

Alternate solution which requires a bit more work on your side is to write the second loop using intrinsics or inline assembler.

That second loop is an interesting example where compiler should do much better I will submit a feature request to the Premier Support on your behalf.

0 Kudos
Reply