Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
155 Views

__assume_aligned(x) in a for loop

#   define MY_ALIGNED_MEMORY_ALIGN  sizeof(__m256)  /* _mm_malloc align [bytes] */
#   define MY_ASSUME_ALIGNED(x)     __assume_aligned((x),MY_ALIGNED_MEMORY_ALIGN)

for(auto& i1:packets) {
        MY_ASSUME_ALIGNED(i1.left);
        MY_ASSUME_ALIGNED(i1.right);
    }

"packets" is std::vector<> of structure with "left" and "right" float pointers, suitable for vectorization.

Isn't this give an overhead for this "for" loop when I just only want to give compiler a hint that pointers are aligned?

0 Kudos
7 Replies
Highlighted
155 Views

hmm...this case is interesting...the "__assume_aligned()" is only a hint to compiler that the array is aligned. But for your case, I am wondering how it can be vectorized...could you please provide a complete test case to run? What code logic will you expect in the 'for' loop?

Thanks,

Shenghong

0 Kudos
Highlighted
New Contributor II
155 Views

Marian M. wrote:

#   define MY_ALIGNED_MEMORY_ALIGN  sizeof(__m256)  /* _mm_malloc align [bytes] */
#   define MY_ASSUME_ALIGNED(x)     __assume_aligned((x),MY_ALIGNED_MEMORY_ALIGN)

for(auto& i1:packets) {
        MY_ASSUME_ALIGNED(i1.left);
        MY_ASSUME_ALIGNED(i1.right);
    }

"packets" is std::vector<> of structure with "left" and "right" float pointers, suitable for vectorization.

Isn't this give an overhead for this "for" loop when I just only want to give compiler a hint that pointers are aligned?

I am using after this for construct code like this:

float* __restrict p1=packets[0].left;

and then vectorised operations on pointer "p1".

0 Kudos
Highlighted
155 Views

Hi Marian,

I do not think this will have any "overhead" on the performance, Your code may look like below from my guess:

for(auto& i1:packets) {
        MY_ASSUME_ALIGNED(i1.left);
        MY_ASSUME_ALIGNED(i1.right);
	
	// hope below loop will be vectorized, as i1.left is aligned
	for(i=0;i<N;i++) {
		i1.left=....
	}
	
    }

The __assume_aligned is only a hint to compiler to get the alignment information during compilation, so that it will vectorized the followed loops or do other optimization jobs. I do not think it will generate some code for it, hence it should not have "overhead". If compiler is able to get the alignment information itself (by analyzing the code), you do not need to use it.

Thanks,

Shenghong

0 Kudos
Highlighted
Beginner
155 Views

Hi shenghong-geng (Intel)

I actually like excessive use of intrinsic functions, to let the compiler get hints for my code (like restrict keyword, or built-in expect, etc...)

Thank you very much for clarification.

0 Kudos
Highlighted
New Contributor II
155 Views

But my question is going further...

When I write:

for(auto& i1:packets) {
	        MY_ASSUME_ALIGNED(i1.left);
	        MY_ASSUME_ALIGNED(i1.right);
}

packets[2].left=...vectorization support

Can I get vectorization support when for() range support is not inside of scope of next code?

PS: I try so hard to be compatible with ICC to let it know how to vectorise it, but it's heuristics sometime fails :-(

0 Kudos
Highlighted
155 Views

Hi Marian,

I will suggest you to take a look at this article:

https://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization

Quote:

Clauses such as __assume_aligned and __assume tell the compiler that the property holds at the particular point in the program where the clause appears. So the statement "__assume_aligned(a, 64);" means the pointer a is aligned at 64-bytes whenever program execution reaches this point. Compiler may propagate that property to other points in the program (such as a later loop), but it is not guaranteed (it is possible that compiler has to make conservative assumptions and cannot apply the property safely for a later loop in the same function).

Also, as mentioned there:

It is always a good idea to check if the compiler generated aligned accesses as expected for a vectorized loop, this information is part of the -vec-report6 output from the compiler. 

Note: it is not easy to figure out the heuristics of compiler, so we will usually check the report to make sure it works as expected. For your case, I do not think compiler will be smart enough to vectorize it, but it is hard to say...maybe compiler is smart. :) We may need to check case by case using the optimization report.

Thanks,

Shenghong

0 Kudos
Highlighted
Black Belt
155 Views

vec-report6 was the spelling for a past compiler version.  In icc 14.0 the nearest equivalents were vec-report4 or -opt-report4, the latter changed to -qopt-report4 in the current release.

This thread raises an interesting question about how alignment directives could be used in such a context. 

Among the points in the sales pitch for __assume_aligned has been that it might work at function scope, which doesn't appear useful here.

0 Kudos