Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

vectorization of variable stride for loop

TimP
Honored Contributor III
310 Views

I noticed that Intel 16.0 and 16.0.1 C++ improved scalar optimization of variable stride to the point where it matches gcc and MSVC and makes the default auto-vectorization of such loops unproductive for host AVX, e.g.

foo( float * __restrict a, float * b);

#if _OPENMP >= 201307
#if ! __MIC__
#pragma omp simd safelen(1)
#else
#pragma omp simd
#endif
#endif
      for (i = *n1; i <= i2; i__ += i3)
          a += b;

yet the vectorization for MIC doesn't occur without pragma.

For MIC, vectorization of this yields > 10x performance gain, while host vectorization increases run time by 50%, yet the compiler's default choices have it backwards.

This looks like a step back in the direction of advocating pragma controlled vectorization, with the requirement for target specification.  I put #if _OPENMP on so as to permit compilation by MSVC, which supports only OpenMP 2.0, in spite of supporting a fair amount of auto-vectorization in recent versions. __restrict becomes irrelevant if it's necessary to set pragmas for each target; maybe that's considered as an advantage.

Fortunately, in this case, icc controls vectorization by safelen(1). 

Both icc and gcc exhibit cases where vectorization occurs in violation of safelen, as well as cases where safelen(1) is a convenient portable replacement for #pragma no vector.  __restrict can be used to control auto-vectorization sometimes, but not always.  #pragma no vector doesn't appeal to me in cases like this where MIC needs #pragma omp simd and where non-Intel compilers run into similar issues.

0 Kudos
1 Reply
TimP
Honored Contributor III
310 Views

Intel c++ 17.1 release has corrected some cases where safelen(1) clause was ignored.  This makes it more suitable as a portable replacement for #pragma novector.

0 Kudos
Reply