- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Intel developers,
I'm using intel 15 on E5-2670 processor. Analyzing my code by using Vtune, in a particolar line when I unpack a m128 type in order to sum in a single floating point each elements like horizontal sum, like this:
_mm_store_ps(denom_arr_tmp, denom_tmp); semblance[m_local] += denom_arr_tmp[0]+denom_arr_tmp[1]+denom_arr_tmp[2]+denom_arr_tmp[3];
The assembly generated is:
vunpckhps %xmm2, %xmm2, %xmm3 movq -0x80(%rbp), %rax vaddssl -0x9c(%rbp), %xmm2, %xmm4 vaddss %xmm3, %xmm4, %xmm5 vaddssl -0x94(%rbp), %xmm5, %xmm6 vaddssl (%rax,%r14,4), %xmm6, %xmm7 vmovssl %xmm7, (%rax,%r14,4)
My question is: what is VADDSSL instruction? What's the difference with VADDSS? How I can optimize that piece of code? Actually is a bottleneck.
Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
if you are using GAS/AT&T syntax, it uses postfix/Operation Suffixes in the instruction name,
l normally stands for long, see like here http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you defer your horizontal add (move it out one loop level)? IOW have the inner loop accumulate the denom_tmp's into semblance_tmp, then in the next outer loop perform the horizontal add across semblance_tmp into semblance?
Jim Demspey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Maxym Dmytrychenko (Intel) wrote:
if you are using GAS/AT&T syntax, it uses postfix/Operation Suffixes in the instruction name,
l normally stands for long, see like here http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax
Hi Maxym, from your link I read thay "l" stand for:
- l = long (32 bit integer or 64-bit floating point)
so the compiler are using 64 bit floating point? If yes what's the possible reason? I'm using 32 bit floating point. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
Can you defer your horizontal add (move it out one loop level)? IOW have the inner loop accumulate the denom_tmp's into semblance_tmp, then in the next outer loop perform the horizontal add across semblance_tmp into semblance?
Jim Demspey
Thanks Jim, I'll try your suggest paying attention to the results.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page