Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

The compiler does not optimize at all constants operations using SIMD intrinsics

Diego_Caballero
Beginner
320 Views

Hi,

I know that SIMD intrinsics in C/C++ are very limited and some qualifiers (constant, volatile, etc.) are dropped off but I find really disappointing that this affects the quality of the assembly code. For example, let's say you have the following simple code:

[cpp]

int a = (5 + 2)/2;

[/cpp]

In this case, the compiler computes the constant expression at compile time and it generates just a movement

[cpp]

movl $3, %eax

[/cpp]

However, if you provide this SIMD code:

[cpp]

__m512i a = _mm512_div_epi32(_mm512_add_epi32(_mm512_set1_epi32(5), _mm512_set1_epi32(2)), _mm512_set1_epi32(2));

[/cpp]

The compiler is not able to simplify the code and it generates:

[cpp]

vmovaps .L_2il0floatpacket.3(%rip), %zmm1
vpaddd .L_2il0floatpacket.2(%rip), %zmm1, %zmm0
call __svml_idiv16

[/cpp]

which is really inneficient. Note that it is not even able to detect that you are dividing by 2, which should be optimized by a shift.

Of course, I'm compiling with -O3, so I would like to know if it is possible to make the compiler optimize this kind of things in intrinsics since I'm not able to provide a better optimized code.

Kind regards.

0 Kudos
3 Replies
Bernard
Valued Contributor I
320 Views

It seems that compiler decided to call svml integer division function which will add the latency of call instruction to the latency of division instruction.

0 Kudos
jimdempseyatthecove
Honored Contributor III
320 Views

Did you get the instruction sequence from a .ASM listing file produced by the compiler?
Or, did you get the instruction sequence from a debugger dissassembly window or VTune dissassembly window?

If from .ASM file, I suggest you capture a release build (with no .ASM file) and look at the resultant code in the debugger (or VTune). You may see additional optimizations.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
320 Views

Did you get the instruction sequence from a .ASM listing file produced by the compiler?
Or, did you get the instruction sequence from a debugger dissassembly window or VTune dissassembly window?

If from .ASM file, I suggest you capture a release build (with no .ASM file) and look at the resultant code in the debugger (or VTune). You may see additional optimizations.

Jim Dempsey

0 Kudos
Reply