Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

VADDSSL instruction?

unrue
Beginner
518 Views

Dear Intel developers,

I'm using intel 15 on E5-2670 processor. Analyzing my code by using Vtune, in a particolar line when I unpack a m128 type in order to sum in a single floating point each elements like horizontal sum, like this:

 

_mm_store_ps(denom_arr_tmp, denom_tmp);

 semblance[m_local] += denom_arr_tmp[0]+denom_arr_tmp[1]+denom_arr_tmp[2]+denom_arr_tmp[3];

 

The assembly generated is:

vunpckhps %xmm2, %xmm2, %xmm3  
movq  -0x80(%rbp), %rax 
vaddssl  -0x9c(%rbp), %xmm2, %xmm4      
vaddss %xmm3, %xmm4, %xmm5      
vaddssl  -0x94(%rbp), %xmm5, %xmm6     
vaddssl  (%rax,%r14,4), %xmm6, %xmm7   
vmovssl  %xmm7, (%rax,%r14,4) 

 

My question is: what is VADDSSL instruction? What's the difference with VADDSS? How I can optimize that piece of code? Actually is a bottleneck.

Thanks.

 

0 Kudos
4 Replies
Maxym_D_Intel
Employee
518 Views

if you are using GAS/AT&T syntax, it uses postfix/Operation Suffixes in the instruction name,

l normally stands for long, see like here http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax

0 Kudos
jimdempseyatthecove
Honored Contributor III
518 Views

Can  you defer your horizontal add (move it out one loop level)? IOW have the inner loop accumulate the denom_tmp's into semblance_tmp, then in the next outer loop perform the horizontal add across semblance_tmp into semblance?

Jim Demspey

0 Kudos
unrue
Beginner
518 Views

Maxym Dmytrychenko (Intel) wrote:

if you are using GAS/AT&T syntax, it uses postfix/Operation Suffixes in the instruction name,

l normally stands for long, see like here http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax

 

Hi Maxym, from your link I read thay "l" stand for:

 

  • l = long (32 bit integer or 64-bit floating point)

so the compiler are using 64 bit floating point? If yes what's the possible reason? I'm using 32 bit floating point. Thanks.

0 Kudos
unrue
Beginner
518 Views

jimdempseyatthecove wrote:

Can  you defer your horizontal add (move it out one loop level)? IOW have the inner loop accumulate the denom_tmp's into semblance_tmp, then in the next outer loop perform the horizontal add across semblance_tmp into semblance?

Jim Demspey

 

Thanks Jim, I'll try your suggest paying attention to the results.

0 Kudos
Reply