Intel® C++ Compiler
Support and discussions for creating C++ code that runs on platforms based on Intel® processors.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
7679 Discussions

The compiler does not optimize at all constants operations using SIMD intrinsics

Diego_Caballero
Beginner
128 Views

Hi,

I know that SIMD intrinsics in C/C++ are very limited and some qualifiers (constant, volatile, etc.) are dropped off but I find really disappointing that this affects the quality of the assembly code. For example, let's say you have the following simple code:

[cpp]

int a = (5 + 2)/2;

[/cpp]

In this case, the compiler computes the constant expression at compile time and it generates just a movement

[cpp]

movl $3, %eax

[/cpp]

However, if you provide this SIMD code:

[cpp]

__m512i a = _mm512_div_epi32(_mm512_add_epi32(_mm512_set1_epi32(5), _mm512_set1_epi32(2)), _mm512_set1_epi32(2));

[/cpp]

The compiler is not able to simplify the code and it generates:

[cpp]

vmovaps .L_2il0floatpacket.3(%rip), %zmm1
vpaddd .L_2il0floatpacket.2(%rip), %zmm1, %zmm0
call __svml_idiv16

[/cpp]

which is really inneficient. Note that it is not even able to detect that you are dividing by 2, which should be optimized by a shift.

Of course, I'm compiling with -O3, so I would like to know if it is possible to make the compiler optimize this kind of things in intrinsics since I'm not able to provide a better optimized code.

Kind regards.

0 Kudos
3 Replies
Bernard
Black Belt
128 Views

It seems that compiler decided to call svml integer division function which will add the latency of call instruction to the latency of division instruction.

jimdempseyatthecove
Black Belt
128 Views

Did you get the instruction sequence from a .ASM listing file produced by the compiler?
Or, did you get the instruction sequence from a debugger dissassembly window or VTune dissassembly window?

If from .ASM file, I suggest you capture a release build (with no .ASM file) and look at the resultant code in the debugger (or VTune). You may see additional optimizations.

Jim Dempsey

jimdempseyatthecove
Black Belt
128 Views

Did you get the instruction sequence from a .ASM listing file produced by the compiler?
Or, did you get the instruction sequence from a debugger dissassembly window or VTune dissassembly window?

If from .ASM file, I suggest you capture a release build (with no .ASM file) and look at the resultant code in the debugger (or VTune). You may see additional optimizations.

Jim Dempsey

Reply