- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I'm fighting ICC to optimize a certain code, something like this:
float Function(float x) { // a few lines of code, some short floating point math }; ... for (int i=0; i<cnt; i++) { // a few lines of code dst = Function(x); };
I also added "force vectorize" pragma to the loop. Now if I leave it like this, the testing program takes 10 seconds. If I put the body of "Function" directly into the cycle however, it will take 6 seconds, because ICC will correctly use AVX and actually create a pretty long stuff from it. So there's like 40% improvement! I even tried _Pragma("vector always") before the "Function" call, but nothing, still slow.
Any ideas?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you looked at...
__declspec(vector) (Windows*)
__attribute__(vector) (Linux* and OS X*)
Combines with the map operation at the call site to provide the data parallel semantics. When multiple instances of the vector declaration are invoked in a parallel context, the execution order among them is not sequenced.
Attribute your function with that.
The inline-ing would not require this decoration, but you must keep in mind that wishing for the code to vectorize does not relieve you from your responsibility to make it vectoizable.
What are your few lines of code inside the function. Something inside there must be thwarting your intent.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the info. I started digging into ICC logs and found something odd - in way too many cases ICC doesn't allow vectorization. Consider this example I have here right now:
template <class type> static MFORCEINLINE type GetCubic(type y0, type y1, type y2, type y3, type x) { type a = (3 * (y1-y2) - y0 + y3) * (type)0.5; type b = 2*y2 + y0 - (5*y1 + y3) * (type)0.5; type c = (y2 - y0) * (type)0.5; return a * x * x * x + b * x * x + c * x + y1; }; ... // Loop to be vectorized, inlined function. const float y0 = PrecomputedPtr[index-1]; const float y1 = PrecomputedPtr[index+0]; const float y2 = PrecomputedPtr[index+1]; const float y3 = PrecomputedPtr[index+2]; return MInterpolation::GetCubic(y0, y1, y2, y3, x); };
This gets vectorized just fine. Now if I change the statement to this:
return MInterpolation::GetCubic(PrecomputedPtr[index-1], PrecomputedPtr[index+0], PrecomputedPtr[index+1], PrecomputedPtr[index+2], x);
which is the exact same thing, I just didn't create new variables, ICC doesn't vectorize it and says this:
remark #15344: loop was not vectorized: vector dependence prevents vectorization
remark #15346: vector dependence: assumed ANTI dependence between this_123622 line 174 and a line 174
remark #15346: vector dependence: assumed FLOW dependence between a line 174 and this_123622 line 174
All of the functions are "const", PrecomputedPtr is part of the object, I also tried to use __declspec(noalias) to the only output pointer, so the compiler should assume nothing is going to change. Yet, it says there's a dependence...
Any ideas?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you know you want it to be vectorized, you know how you want it to be vectorize, and you are going to use intel specific stuff, I would suggest you to take the time to vectorize by hand your interpolation kernel once for all (forceinline is a good idea, so all the messy AVX/SSE stuff is hidden behind a function bound to "vanish" at link time).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't follow your answer really. My point in the code above is that it doesn't really make sense that the compiler complaints and doesn't want to vectorize something. There's no difference in the 2 codes, one is just "nicer". But ICC doesn't like the nicer one...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page