- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
# define MY_ALIGNED_MEMORY_ALIGN sizeof(__m256) /* _mm_malloc align [bytes] */ # define MY_ASSUME_ALIGNED(x) __assume_aligned((x),MY_ALIGNED_MEMORY_ALIGN) for(auto& i1:packets) { MY_ASSUME_ALIGNED(i1.left); MY_ASSUME_ALIGNED(i1.right); }
"packets" is std::vector<> of structure with "left" and "right" float pointers, suitable for vectorization.
Isn't this give an overhead for this "for" loop when I just only want to give compiler a hint that pointers are aligned?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hmm...this case is interesting...the "__assume_aligned()" is only a hint to compiler that the array is aligned. But for your case, I am wondering how it can be vectorized...could you please provide a complete test case to run? What code logic will you expect in the 'for' loop?
Thanks,
Shenghong
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Marian M. wrote:
# define MY_ALIGNED_MEMORY_ALIGN sizeof(__m256) /* _mm_malloc align [bytes] */ # define MY_ASSUME_ALIGNED(x) __assume_aligned((x),MY_ALIGNED_MEMORY_ALIGN) for(auto& i1:packets) { MY_ASSUME_ALIGNED(i1.left); MY_ASSUME_ALIGNED(i1.right); }"packets" is std::vector<> of structure with "left" and "right" float pointers, suitable for vectorization.
Isn't this give an overhead for this "for" loop when I just only want to give compiler a hint that pointers are aligned?
I am using after this for construct code like this:
float* __restrict p1=packets[0].left;
and then vectorised operations on pointer "p1".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Marian,
I do not think this will have any "overhead" on the performance, Your code may look like below from my guess:
for(auto& i1:packets) { MY_ASSUME_ALIGNED(i1.left); MY_ASSUME_ALIGNED(i1.right); // hope below loop will be vectorized, as i1.left is aligned for(i=0;i<N;i++) { i1.left=.... } }
The __assume_aligned is only a hint to compiler to get the alignment information during compilation, so that it will vectorized the followed loops or do other optimization jobs. I do not think it will generate some code for it, hence it should not have "overhead". If compiler is able to get the alignment information itself (by analyzing the code), you do not need to use it.
Thanks,
Shenghong
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I actually like excessive use of intrinsic functions, to let the compiler get hints for my code (like restrict keyword, or built-in expect, etc...)
Thank you very much for clarification.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But my question is going further...
When I write:
for(auto& i1:packets) { MY_ASSUME_ALIGNED(i1.left); MY_ASSUME_ALIGNED(i1.right); } packets[2].left=...vectorization support
Can I get vectorization support when for() range support is not inside of scope of next code?
PS: I try so hard to be compatible with ICC to let it know how to vectorise it, but it's heuristics sometime fails :-(
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Marian,
I will suggest you to take a look at this article:
https://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization
Quote:
Clauses such as __assume_aligned and __assume tell the compiler that the property holds at the particular point in the program where the clause appears. So the statement "__assume_aligned(a, 64);" means the pointer a is aligned at 64-bytes whenever program execution reaches this point. Compiler may propagate that property to other points in the program (such as a later loop), but it is not guaranteed (it is possible that compiler has to make conservative assumptions and cannot apply the property safely for a later loop in the same function).
Also, as mentioned there:
It is always a good idea to check if the compiler generated aligned accesses as expected for a vectorized loop, this information is part of the -vec-report6 output from the compiler.
Note: it is not easy to figure out the heuristics of compiler, so we will usually check the report to make sure it works as expected. For your case, I do not think compiler will be smart enough to vectorize it, but it is hard to say...maybe compiler is smart. :) We may need to check case by case using the optimization report.
Thanks,
Shenghong
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
vec-report6 was the spelling for a past compiler version. In icc 14.0 the nearest equivalents were vec-report4 or -opt-report4, the latter changed to -qopt-report4 in the current release.
This thread raises an interesting question about how alignment directives could be used in such a context.
Among the points in the sales pitch for __assume_aligned has been that it might work at function scope, which doesn't appear useful here.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page