Vectorization with SIMD-enabled functions works from functions, not from main()

Andrey_Vladimirov · ‎11-04-2015

Hello,

I have run into a situation that I cannot explain. I have a loop with a SIMD-enabled function and I use #pragma simd before it. This loop vectorizes if it is placed in a separate function, but does not vectorize if it is inside main(). I am using Intel C++ compiler 16.0.0.109. Please see code and vectorization reports below. Can anyone explain what is happening and if there is a way to work around this?

This is loop-in-main.cc:

__attribute__((vector)) void SimdEnabledFunction(double);

int main() {
  int n = 10000;
  double a;
#pragma simd
  for(int i = 0 ; i < n ; i++)
      SimdEnabledFunction(a);
}

This is the optimization report for it (loop does not vectorize):

[avladim@cfx-0 ~]$ icpc -qopenmp -c -qopt-report -qopt-report-stdout loop-in-main.cc
Intel(R) Advisor can now assist with vectorization and show optimization
  report messages with your source code.
See "https://software.intel.com/en-us/intel-advisor-xe" for details.


    Report from: Interprocedural optimizations [ipo]

INLINING OPTION VALUES:
  -inline-factor: 100
  -inline-min-size: 30
  -inline-max-size: 230
  -inline-max-total-size: 2000
  -inline-max-per-routine: 10000
  -inline-max-per-compile: 500000


Begin optimization report for: main()

    Report from: Interprocedural optimizations [ipo]

INLINE REPORT: (main()) [1] loop-in-main.cc(3,12)

loop-in-main.cc(7): (col. 3) warning #13379: loop was not vectorized with "simd"

    Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]


LOOP BEGIN at loop-in-main.cc(7,3)
   remark #15520: simd loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria
   remark #13379: loop was not vectorized with "simd"
LOOP END
===========================================================================
[avladim@cfx-0 ~]$

This is the other code, loop-in-func.cc, where the loop is in a separate function:

__attribute__((vector)) void SimdEnabledFunction(double);

void UserFunction(int n, double* a) {
#pragma simd
  for(int i = 0 ; i < n ; i++)
      SimdEnabledFunction(a);
}

int main() {
  int n = 10000;
  double a;
  UserFunction(n, a);
}

This is the optimization report for it (SIMD LOOP WAS VECTORIZED):

[avladim@cfx-0 ~]$ icpc -qopenmp -c -qopt-report -qopt-report-stdout loop-in-func.cc
Intel(R) Advisor can now assist with vectorization and show optimization
  report messages with your source code.
See "https://software.intel.com/en-us/intel-advisor-xe" for details.


    Report from: Interprocedural optimizations [ipo]

INLINING OPTION VALUES:
  -inline-factor: 100
  -inline-min-size: 30
  -inline-max-size: 230
  -inline-max-total-size: 2000
  -inline-max-per-routine: 10000
  -inline-max-per-compile: 500000


Begin optimization report for: main()

    Report from: Interprocedural optimizations [ipo]

INLINE REPORT: (main()) [1] loop-in-func.cc(9,12)
  -> INLINE: (12,3) UserFunction(int, double *)

loop-in-func.cc(5): (col. 3) warning #13379: loop was not vectorized with "simd"

    Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]


LOOP BEGIN at loop-in-func.cc(5,3) inlined into loop-in-func.cc(12,3)
   remark #15520: simd loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria
   remark #13379: loop was not vectorized with "simd"
LOOP END
===========================================================================

Begin optimization report for: UserFunction(int, double *)

    Report from: Interprocedural optimizations [ipo]

INLINE REPORT: (UserFunction(int, double *)) [2] loop-in-func.cc(3,37)


    Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]


LOOP BEGIN at loop-in-func.cc(5,3)
<Peeled loop for vectorization>
LOOP END

LOOP BEGIN at loop-in-func.cc(5,3)
   remark #15301: SIMD LOOP WAS VECTORIZED
LOOP END

LOOP BEGIN at loop-in-func.cc(5,3)
<Remainder loop for vectorization>
   remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override 
LOOP END
===========================================================================
[avladim@cfx-0 ~]$

Andrey

Andrey_Vladimirov · ‎11-11-2015

Note to self: this question was answered in the Intel C++ Compiler forum: https://software.intel.com/en-us/forums/intel-c-compiler/topic/599705

Bottom line: it is a compiler bug, and a work-around is available.