- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I attached two code files mandel1.cpp and mandel2.cpp.
mandel1.cpp has a loop with all the code in the body
mandel2.cpp has equivalent code but instead of having the code in the body it calls an inline function
Compiling with intel c++ compiler 15 with "icc -O3 -fp-model fast=2 -xCORE-AVX2 -fma -c -S", I can vectorize mandel1.cpp but not mandel2.cpp.
Is there I way I can vectorize mandel2.cpp and still have a separate function? It seems like the optimizer ought to just be able to inline and then apply the vectorization if it can vectorize mandel1.cpp.
I tried using the "vector" attribute, but it doesn't look like it works with struct/class arguments.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is an inliner heuristic to avoid turning "innermost SIMD loop" into "outer SIMD loop".
What you'd like to achieve can be done by using forceinline pragma, overriding the heuristic.
std::complex<float> c(i / 10.0f, j / 10.0f);
int count;
#pragma forceinline
count = mandel(c, depth);
*(p + j + i*max_col) = count;
SIMD enabled function is a performance feature ---- and call-by-value struct (which involves copying struct) is typically not high performing. As such, it is lower in our priority, compared to the ability to pass the address to struct. The following code will get you closer to what you wanted by leaving it as a call.
//inline
__attribute__((vector(linear(cp))))
int mandel(std::complex<float> *cp, int max_count) {
std::complex<float> c = *cp;
....
I think there is a way to get a native complex type (as opposed to a struct) like C99 _Complex, but I leave that to someone else who knows more about such things.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is an inliner heuristic to avoid turning "innermost SIMD loop" into "outer SIMD loop".
What you'd like to achieve can be done by using forceinline pragma, overriding the heuristic.
std::complex<float> c(i / 10.0f, j / 10.0f);
int count;
#pragma forceinline
count = mandel(c, depth);
*(p + j + i*max_col) = count;
SIMD enabled function is a performance feature ---- and call-by-value struct (which involves copying struct) is typically not high performing. As such, it is lower in our priority, compared to the ability to pass the address to struct. The following code will get you closer to what you wanted by leaving it as a call.
//inline
__attribute__((vector(linear(cp))))
int mandel(std::complex<float> *cp, int max_count) {
std::complex<float> c = *cp;
....
I think there is a way to get a native complex type (as opposed to a struct) like C99 _Complex, but I leave that to someone else who knows more about such things.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page