- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm using unaligned and aligned load intrinsics in my code and ICC does not behave as I expect it to. If this is expected behavior, can somebody educate me on why?
The fundamental problem is I expect aligned load intrinsics to generate aligned instructions while unaligned intrinsics generate unaligned instructions However, what I see is that depending on compiler flags sometimes aligned load intrinsics generate unaligned instructions.
I've attached a snippet of code to demonstrate. I realize this code will segfault when run. The point is just to compile and and look at the generated assembly code.
There are 3 cases I experimented with (comments in the code give compiler version and detailed compile arguments)
- gcc - GNU compiler behaves as expected meaning that aligned load intrinsics map to aligned load instructions.
- icc with no "-m" argument. This works exactly like gcc. Aligned loads map to aligned intrinsics.
- icc with '-mavx' argument. (Note gcc requires this argument to even compile the example). With this argument aligned load intrinsics use unaligned load instructions. The same also happens with '-msse4.2'
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, with -mavx set, icc translates SSE intrinsics to AVX-128 (in part, so as to avoid AVX transitions when mixing with AVX code), I wouldn't be surprised if it chose unaligned AVX-128 instructions as a part of the translation. If you don't like that, you could go back to icc 11.1. This means that it would not fault at execution if unaligned data are encountered, but there should be no performance penalty in comparison with other alternatives.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim,
>> icc translates SSE intrinsics to AVX-128 (in part, so as to avoid AVX transitions when mixing with AVX code)
But then why is icc translating AVX-128 intinsics to AVX-128 intrinsics? (IOW translating aligned intrinsics to unaligned intrinsics)
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In AVX, VMOVUPS (unaligned) and VMOVAPS (aligned) has the exact same performance when the adress is aligned, that's why icc do not emit VMOVAPS anymore.
That's the answer i got when I asked the question on this forum, and after benchmarking it was true indeed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
emmanuel.attia wrote:
In AVX, VMOVUPS (unaligned) and VMOVAPS (aligned) has the exact same performance when the adress is aligned, that's why icc do not emit VMOVAPS anymore.
That's the answer i got when I asked the question on this forum, and after benchmarking it was true indeed.
That's great news for me! Thank you very much! I was wondering when I used __assume_aligned(x) (on aligned-malloc buffer) why the assembler dump reads unaligned AVX instructions and not aligned ones.
I was about to file a bug report.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For those who are curious, there are guidelines:
http://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Very interesting article.Thank you for providing the link.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Interesting article, thanks for pointing that out.....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Still I think in my opinion ICC should generate aligned version of AVX instructions when I use
__assume_aligned(x,64);
so when I have a bug (such as ordinary malloc instead of aligned one) in the executed code it will produce some CPU exception, in order to easily find a bug. Because unaligned instructions have the same CPU % load as aligned, as long as buffer is really aligned (as @emmanuel.attia pointed out), but it is hard to find such bug.
Something to think of, like a side note: Why there are aligned versions of AVX instructions, when ICC doesn't use them?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Marián "VooDooMan" Meravý wrote:and
Still I think in my opinion ICC should generate aligned version of AVX instructions when I use
__assume_aligned(x,64);so when I have a bug (such as ordinary malloc instead of aligned one) in the executed code it will produce some CPU exception, in order to easily find a bug. Because unaligned instructions have the same CPU % load as aligned, as long as buffer is really aligned (as @emmanuel.attia pointed out), but it is hard to find such bug.
Something to think of, like a side note: Why there are aligned versions of AVX instructions, when ICC doesn't use them?
Certainly these are interesting questions, and bugs such as you mention may be exposed when moving to MIC compilation or in cases where the compiler exploits opportunities for fusion. These alignment directives require much detailed testing with each change of application or platform target or even compiler update.
I have resorted on occasion to inserting debug code to check alignments when I was trying to detect or eliminate such concerns.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim Prince wrote:
Quote:
Marián "VooDooMan" Meravý wrote:Still I think in my opinion ICC should generate aligned version of AVX instructions when I use
__assume_aligned(x,64);so when I have a bug (such as ordinary malloc instead of aligned one) in the executed code it will produce some CPU exception, in order to easily find a bug. Because unaligned instructions have the same CPU % load as aligned, as long as buffer is really aligned (as @emmanuel.attia pointed out), but it is hard to find such bug.
Something to think of, like a side note: Why there are aligned versions of AVX instructions, when ICC doesn't use them?
and
Certainly these are interesting questions, and bugs such as you mention may be exposed when moving to MIC compilation or in cases where the compiler exploits opportunities for fusion. These alignment directives require much detailed testing with each change of application or platform target or even compiler update.
I have resorted on occasion to inserting debug code to check alignments when I was trying to detect or eliminate such concerns.
I agree I should test [like assert()] for alignment in debug builds.
I don't have MIC environment, so this is not my concern.
But I still demand for ICC to produce not-unaligned (i.e. aligned) instructions in case of
__assume_aligned(x,64);
It makes real sense.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It makes very real sense.
@Intel: ping?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Marián "VooDooMan" Meravý wrote:
Still I think in my opinion ICC should generate aligned version of AVX instructions when I use
__assume_aligned(x,64);so when I have a bug (such as ordinary malloc instead of aligned one) in the executed code it will produce some CPU exception, in order to easily find a bug. Because unaligned instructions have the same CPU % load as aligned, as long as buffer is really aligned (as @emmanuel.attia pointed out), but it is hard to find such bug.
Something to think of, like a side note: Why there are aligned versions of AVX instructions, when ICC doesn't use them?
This is an interesting feature request. I'm entering it into our problem-tracking database. Let's wait and see what icc engineering team will say about it.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Marian
Please read the following forum post and particularly @Brandon Hewitt response.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
please, give me tracking number, and update this thread when the issue will be covered. Is it likely in the beta ICC? for the first time?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, I did, but it is not suiting my needs. I talked about unaligned memory access instructions that are causing CPU exception: This is what I need to detect mis-aligned data, which is my goal, to fight against programmer's bug.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Marián "VooDooMan" Meravý wrote:
please, give me tracking number, and update this thread when the issue will be covered. Is it likely in the beta ICC? for the first time?
The issue tracking number is DPD200255492. However, engineering team hasn't made any decisions regarding this feature request.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page