“FUNCTION WAS VECTORIZED” but it doesn't vectorize on the place of the function call

Endre_L_ · ‎02-19-2014

The present question relates to an already existing question on Stackoverflow with the difference that in this case AVX is the target ISA and that the function to be vectorized is more complex. When I use the __attribute__((vector(...))) declaration in the function definition:

__attribute__((vector(linear(a),linear(b))))
inline void foo(float* restrict a, float* restrict b) { 
   ...
   for(j=0; j<n; j++) {   
      // do something with a[j*STRIDE] and b[j*STRIDE]
   }
   for(j=n-1; j>=0; j--) {
      // do something with a[j*STRIDE] and b[j*STRIDE]
   }
}

the compiler reports the following for the function foo():

 foo.hpp(56): (col. 101) remark: FUNCTION WAS VECTORIZED
 foo.hpp(56): (col. 101) remark: FUNCTION WAS VECTORIZED

When I want to call the function with array notation or a single for loop:

int main() {
  ...
  #pragma omp parallel for
  for(k=0; k<n; k++) {
    int base = k*256*256;

    FP* __restrict a = &h_a[base];
    FP* __restrict b = &h_b[base];

    __assume_aligned(a,32);
    __assume_aligned(b,32);

    foo(&a[0:256], &b[0:256]); // line 337
    // OR for(i=0; i<n; i++) { foo(&a, &b);
  }
}

it refuses to vectorize:

 main.c(337): (col. 3) remark: loop was not vectorized: existence of vector dependence
 main.c(337): (col. 3) remark: loop was not vectorized: existence of vector dependence
 main.c(337): (col. 3) remark: loop was not vectorized: not inner loop

The used Intel compiler flags are:

 icc -O3 -xAVX -ip -restrict -parallel -fopenmp -vec-report2 -openmp-report2

The question: If the compiler could vectorize the function foo(), why it can not use the vectorized version on the place of the function call (main.c:337)? The "remark" message suggests that the function was analysed again by the compiler, instead of simply injecting the already compiled vector code.

Note: I tried to use a for loop instead of array notation with #pragma ivdep and also #pragma simd, but non of them helped. The actual code is much larger, then it would conveniently fit in this post.

jimdempseyatthecove · ‎02-21-2014

>> // do something with a[j*STRIDE] and b[j*STRIDE]
Implies that the "something" statements will not vectorize (due to *STRIDE)
Although scatter/gather can be called vectorization, in many cases it may not be optimal.

Additionally, when a function is NOT inlined it need not worry about the effects of disturbing variables outside its scope.
When the function IS inlined, the scope of the code where the inline occurs is taken into consideration in optimization decisions (to vectorize or not to vectorize).

If you have a specific code example where you think vectorization should occur, please post it for review.

Jim Dempsey