Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Diego_Caballero
Beginner
61 Views

The compiler does not optimize at all constants operations using SIMD intrinsics

Hi,

I know that SIMD intrinsics in C/C++ are very limited and some qualifiers (constant, volatile, etc.) are dropped off but I find really disappointing that this affects the quality of the assembly code. For example, let's say you have the following simple code:

[cpp]

int a = (5 + 2)/2;

[/cpp]

In this case, the compiler computes the constant expression at compile time and it generates just a movement

[cpp]

movl $3, %eax

[/cpp]

However, if you provide this SIMD code:

[cpp]

__m512i a = _mm512_div_epi32(_mm512_add_epi32(_mm512_set1_epi32(5), _mm512_set1_epi32(2)), _mm512_set1_epi32(2));

[/cpp]

The compiler is not able to simplify the code and it generates:

[cpp]

vmovaps .L_2il0floatpacket.3(%rip), %zmm1
vpaddd .L_2il0floatpacket.2(%rip), %zmm1, %zmm0
call __svml_idiv16

[/cpp]

which is really inneficient. Note that it is not even able to detect that you are dividing by 2, which should be optimized by a shift.

Of course, I'm compiling with -O3, so I would like to know if it is possible to make the compiler optimize this kind of things in intrinsics since I'm not able to provide a better optimized code.

Kind regards.

0 Kudos
3 Replies
Bernard
Black Belt
61 Views

It seems that compiler decided to call svml integer division function which will add the latency of call instruction to the latency of division instruction.

jimdempseyatthecove
Black Belt
61 Views

Did you get the instruction sequence from a .ASM listing file produced by the compiler?
Or, did you get the instruction sequence from a debugger dissassembly window or VTune dissassembly window?

If from .ASM file, I suggest you capture a release build (with no .ASM file) and look at the resultant code in the debugger (or VTune). You may see additional optimizations.

Jim Dempsey

jimdempseyatthecove
Black Belt
61 Views

Did you get the instruction sequence from a .ASM listing file produced by the compiler?
Or, did you get the instruction sequence from a debugger dissassembly window or VTune dissassembly window?

If from .ASM file, I suggest you capture a release build (with no .ASM file) and look at the resultant code in the debugger (or VTune). You may see additional optimizations.

Jim Dempsey

Reply