Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.

VADDSSL instruction?

unrue
Beginner
318 Views

Dear Intel developers,

I'm using intel 15 on E5-2670 processor. Analyzing my code by using Vtune, in a particolar line when I unpack a m128 type in order to sum in a single floating point each elements like horizontal sum, like this:

 

_mm_store_ps(denom_arr_tmp, denom_tmp);

 semblance[m_local] += denom_arr_tmp[0]+denom_arr_tmp[1]+denom_arr_tmp[2]+denom_arr_tmp[3];

 

The assembly generated is:

vunpckhps %xmm2, %xmm2, %xmm3  
movq  -0x80(%rbp), %rax 
vaddssl  -0x9c(%rbp), %xmm2, %xmm4      
vaddss %xmm3, %xmm4, %xmm5      
vaddssl  -0x94(%rbp), %xmm5, %xmm6     
vaddssl  (%rax,%r14,4), %xmm6, %xmm7   
vmovssl  %xmm7, (%rax,%r14,4) 

 

My question is: what is VADDSSL instruction? What's the difference with VADDSS? How I can optimize that piece of code? Actually is a bottleneck.

Thanks.

 

0 Kudos
4 Replies
Maxym_D_Intel
Employee
318 Views

if you are using GAS/AT&T syntax, it uses postfix/Operation Suffixes in the instruction name,

l normally stands for long, see like here http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax

jimdempseyatthecove
Black Belt
318 Views

Can  you defer your horizontal add (move it out one loop level)? IOW have the inner loop accumulate the denom_tmp's into semblance_tmp, then in the next outer loop perform the horizontal add across semblance_tmp into semblance?

Jim Demspey

unrue
Beginner
318 Views

Maxym Dmytrychenko (Intel) wrote:

if you are using GAS/AT&T syntax, it uses postfix/Operation Suffixes in the instruction name,

l normally stands for long, see like here http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax

 

Hi Maxym, from your link I read thay "l" stand for:

 

  • l = long (32 bit integer or 64-bit floating point)

so the compiler are using 64 bit floating point? If yes what's the possible reason? I'm using 32 bit floating point. Thanks.

unrue
Beginner
318 Views

jimdempseyatthecove wrote:

Can  you defer your horizontal add (move it out one loop level)? IOW have the inner loop accumulate the denom_tmp's into semblance_tmp, then in the next outer loop perform the horizontal add across semblance_tmp into semblance?

Jim Demspey

 

Thanks Jim, I'll try your suggest paying attention to the results.

Reply