Software Archive
Read-only legacy content
17061 Discussions

vectorizing with an inline function?

rnickb
Beginner
397 Views

I attached two code files mandel1.cpp and mandel2.cpp.

mandel1.cpp has a loop with all the code in the body

mandel2.cpp has equivalent code but instead of having the code in the body it calls an inline function

Compiling with intel c++ compiler 15 with "icc  -O3 -fp-model fast=2 -xCORE-AVX2 -fma -c -S", I can vectorize mandel1.cpp but not mandel2.cpp.

Is there I way I can vectorize mandel2.cpp and still have a separate function? It seems like the optimizer ought to just be able to inline and then apply the vectorization if it can vectorize mandel1.cpp.

I tried using the "vector" attribute, but it doesn't look like it works with struct/class arguments.

0 Kudos
1 Solution
Hideki_I_Intel
Employee
397 Views

There is an inliner heuristic to avoid turning "innermost SIMD loop" into "outer SIMD loop".
What you'd like to achieve can be done by using forceinline pragma, overriding the heuristic.

      std::complex<float> c(i / 10.0f, j / 10.0f);
      int count;
#pragma forceinline
      count = mandel(c, depth);
      *(p + j + i*max_col) = count;
 

SIMD enabled function is a performance feature ---- and call-by-value struct (which involves copying struct) is typically not high performing. As such, it is lower in our priority, compared to the ability to pass the address to struct. The following code will get you closer to what you wanted by leaving it as a call.

//inline
__attribute__((vector(linear(cp))))
int mandel(std::complex<float> *cp, int max_count) {
  std::complex<float> c = *cp;
....

I think there is a way to get a native complex type (as opposed to a struct) like C99 _Complex, but I leave that to someone else who knows more about such things.

View solution in original post

0 Kudos
1 Reply
Hideki_I_Intel
Employee
398 Views

There is an inliner heuristic to avoid turning "innermost SIMD loop" into "outer SIMD loop".
What you'd like to achieve can be done by using forceinline pragma, overriding the heuristic.

      std::complex<float> c(i / 10.0f, j / 10.0f);
      int count;
#pragma forceinline
      count = mandel(c, depth);
      *(p + j + i*max_col) = count;
 

SIMD enabled function is a performance feature ---- and call-by-value struct (which involves copying struct) is typically not high performing. As such, it is lower in our priority, compared to the ability to pass the address to struct. The following code will get you closer to what you wanted by leaving it as a call.

//inline
__attribute__((vector(linear(cp))))
int mandel(std::complex<float> *cp, int max_count) {
  std::complex<float> c = *cp;
....

I think there is a way to get a native complex type (as opposed to a struct) like C99 _Complex, but I leave that to someone else who knows more about such things.

0 Kudos
Reply