Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Mapping between instrinsics and assembly code

monaco__joe
Beginner
619 Views

 

I'm using unaligned and aligned load intrinsics in my code and ICC does not behave as I expect it to.  If this is expected behavior, can somebody educate me on why?

The fundamental problem is I expect aligned load intrinsics to generate aligned instructions while unaligned intrinsics generate unaligned instructions However, what I see is that depending on compiler flags sometimes aligned load intrinsics generate unaligned instructions. 

I've attached a snippet of code to demonstrate.  I realize this code will segfault when run.  The point is just to compile and and look at the generated assembly code.

There are 3 cases I experimented with (comments in the code give compiler version and detailed compile arguments)

  • gcc - GNU compiler behaves as expected meaning that aligned load intrinsics map to aligned load instructions.
  • icc  with no "-m" argument.  This works exactly like gcc.  Aligned loads map to aligned intrinsics.
  • icc with '-mavx' argument.  (Note gcc requires this argument to even compile the example).  With this argument aligned load intrinsics use unaligned load instructions. The same also happens with '-msse4.2'

 

0 Kudos
16 Replies
TimP
Honored Contributor III
619 Views

Yes, with -mavx set, icc translates SSE intrinsics to AVX-128 (in part, so as to avoid AVX transitions when mixing with AVX code),  I wouldn't be surprised if it chose unaligned AVX-128 instructions as a part of the translation.  If you don't like that, you could go back to icc 11.1.  This means that it would not fault at execution if unaligned data are encountered, but there should be no performance penalty in comparison with other alternatives.

0 Kudos
jimdempseyatthecove
Honored Contributor III
619 Views

Tim,

>> icc translates SSE intrinsics to AVX-128 (in part, so as to avoid AVX transitions when mixing with AVX code)

But then why is icc translating AVX-128 intinsics to AVX-128 intrinsics? (IOW translating aligned intrinsics to unaligned intrinsics)

Jim

0 Kudos
emmanuel_attia
Beginner
619 Views

In AVX, VMOVUPS (unaligned) and VMOVAPS (aligned) has the exact same performance when the adress is aligned, that's why icc do not emit VMOVAPS anymore.

That's the answer i got when I asked the question on this forum, and after benchmarking it was true indeed.

0 Kudos
Marián__VooDooMan__M
New Contributor II
619 Views

emmanuel.attia wrote:

In AVX, VMOVUPS (unaligned) and VMOVAPS (aligned) has the exact same performance when the adress is aligned, that's why icc do not emit VMOVAPS anymore.

That's the answer i got when I asked the question on this forum, and after benchmarking it was true indeed.

That's great news for me! Thank you very much! I was wondering when I used __assume_aligned(x) (on aligned-malloc buffer) why the assembler dump reads unaligned AVX instructions and not aligned ones.

I was about to file a bug report.

0 Kudos
Marián__VooDooMan__M
New Contributor II
619 Views
0 Kudos
Bernard
Valued Contributor I
619 Views

Very interesting article.Thank you for providing the link.

0 Kudos
Kittur_G_Intel
Employee
619 Views

Interesting article, thanks for pointing that out.....

0 Kudos
Marián__VooDooMan__M
New Contributor II
619 Views

Still I think in my opinion ICC should generate aligned version of AVX instructions when I use

__assume_aligned(x,64);

so when I have a bug (such as ordinary malloc instead of aligned one) in the executed code it will produce some CPU exception, in order to easily find a bug. Because unaligned instructions have the same CPU % load as aligned, as long as buffer is really aligned (as @emmanuel.attia pointed out), but it is hard to find such bug.

Something to think of, like a side note: Why there are aligned versions of AVX instructions, when ICC doesn't use them?

0 Kudos
TimP
Honored Contributor III
619 Views

Marián "VooDooMan" Meravý wrote:

Still I think in my opinion ICC should generate aligned version of AVX instructions when I use

__assume_aligned(x,64);

so when I have a bug (such as ordinary malloc instead of aligned one) in the executed code it will produce some CPU exception, in order to easily find a bug. Because unaligned instructions have the same CPU % load as aligned, as long as buffer is really aligned (as @emmanuel.attia pointed out), but it is hard to find such bug.

Something to think of, like a side note: Why there are aligned versions of AVX instructions, when ICC doesn't use them?

and

Certainly these are interesting questions, and bugs such as you mention may be exposed when moving to MIC compilation or in cases where the compiler exploits opportunities for fusion.  These alignment directives require much detailed testing with each change of application or platform target or even compiler update.

I have resorted on occasion to inserting debug code to check alignments when I was trying to detect or eliminate such concerns.

0 Kudos
Marián__VooDooMan__M
New Contributor II
619 Views

Tim Prince wrote:

Quote:

Marián "VooDooMan" Meravý wrote:

Still I think in my opinion ICC should generate aligned version of AVX instructions when I use

__assume_aligned(x,64);

so when I have a bug (such as ordinary malloc instead of aligned one) in the executed code it will produce some CPU exception, in order to easily find a bug. Because unaligned instructions have the same CPU % load as aligned, as long as buffer is really aligned (as @emmanuel.attia pointed out), but it is hard to find such bug.

Something to think of, like a side note: Why there are aligned versions of AVX instructions, when ICC doesn't use them?

 

and

Certainly these are interesting questions, and bugs such as you mention may be exposed when moving to MIC compilation or in cases where the compiler exploits opportunities for fusion.  These alignment directives require much detailed testing with each change of application or platform target or even compiler update.

I have resorted on occasion to inserting debug code to check alignments when I was trying to detect or eliminate such concerns.

I agree I should test [like assert()] for alignment in debug builds.

I don't have MIC environment, so this is not my concern.

But I still demand for ICC to produce not-unaligned (i.e. aligned) instructions in case of

__assume_aligned(x,64);

It makes real sense.

0 Kudos
Marián__VooDooMan__M
New Contributor II
619 Views

It makes very real sense.

@Intel: ping?

0 Kudos
Feilong_H_Intel
Employee
619 Views

Marián "VooDooMan" Meravý wrote:

Still I think in my opinion ICC should generate aligned version of AVX instructions when I use

__assume_aligned(x,64);

so when I have a bug (such as ordinary malloc instead of aligned one) in the executed code it will produce some CPU exception, in order to easily find a bug. Because unaligned instructions have the same CPU % load as aligned, as long as buffer is really aligned (as @emmanuel.attia pointed out), but it is hard to find such bug.

Something to think of, like a side note: Why there are aligned versions of AVX instructions, when ICC doesn't use them?

This is an interesting feature request.  I'm entering it into our problem-tracking database.  Let's wait and see what icc engineering team will say about it.

Thanks.

 

0 Kudos
Bernard
Valued Contributor I
619 Views

@Marian

Please read the following forum post and particularly @Brandon Hewitt response.

https://software.intel.com/en-us/forums/topic/278573

0 Kudos
Marián__VooDooMan__M
New Contributor II
619 Views

@Feilong H (Intel)

please, give me tracking number, and update this thread when the issue will be covered. Is it likely in the beta ICC? for the first time?

0 Kudos
Marián__VooDooMan__M
New Contributor II
619 Views

@iliyapolak

yes, I did, but it is not suiting my needs. I talked about unaligned memory access instructions that are causing CPU exception: This is what I need to detect mis-aligned data, which is my goal, to fight against programmer's bug.

0 Kudos
Feilong_H_Intel
Employee
619 Views

Marián "VooDooMan" Meravý wrote:

@Feilong H (Intel)

please, give me tracking number, and update this thread when the issue will be covered. Is it likely in the beta ICC? for the first time?

The issue tracking number is DPD200255492.  However, engineering team hasn't made any decisions regarding this feature request.

0 Kudos
Reply