Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7929 Discussions

The compiler does not optimize at all constants operations using SIMD intrinsics

Diego_Caballero
Beginner
270 Views

Hi,

I know that SIMD intrinsics in C/C++ are very limited and some qualifiers (constant, volatile, etc.) are dropped off but I find really disappointing that this affects the quality of the assembly code. For example, let's say you have the following simple code:

[cpp]

int a = (5 + 2)/2;

[/cpp]

In this case, the compiler computes the constant expression at compile time and it generates just a movement

[cpp]

movl $3, %eax

[/cpp]

However, if you provide this SIMD code:

[cpp]

__m512i a = _mm512_div_epi32(_mm512_add_epi32(_mm512_set1_epi32(5), _mm512_set1_epi32(2)), _mm512_set1_epi32(2));

[/cpp]

The compiler is not able to simplify the code and it generates:

[cpp]

vmovaps .L_2il0floatpacket.3(%rip), %zmm1
vpaddd .L_2il0floatpacket.2(%rip), %zmm1, %zmm0
call __svml_idiv16

[/cpp]

which is really inneficient. Note that it is not even able to detect that you are dividing by 2, which should be optimized by a shift.

Of course, I'm compiling with -O3, so I would like to know if it is possible to make the compiler optimize this kind of things in intrinsics since I'm not able to provide a better optimized code.

Kind regards.

0 Kudos
3 Replies
Bernard
Black Belt
270 Views

It seems that compiler decided to call svml integer division function which will add the latency of call instruction to the latency of division instruction.

0 Kudos
jimdempseyatthecove
Black Belt
270 Views

Did you get the instruction sequence from a .ASM listing file produced by the compiler?
Or, did you get the instruction sequence from a debugger dissassembly window or VTune dissassembly window?

If from .ASM file, I suggest you capture a release build (with no .ASM file) and look at the resultant code in the debugger (or VTune). You may see additional optimizations.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Black Belt
270 Views

Did you get the instruction sequence from a .ASM listing file produced by the compiler?
Or, did you get the instruction sequence from a debugger dissassembly window or VTune dissassembly window?

If from .ASM file, I suggest you capture a release build (with no .ASM file) and look at the resultant code in the debugger (or VTune). You may see additional optimizations.

Jim Dempsey

0 Kudos
Reply