- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[bash]
void Strange(float *dst, const float *src, int size)
{
const __m256 k1 = _mm256_set1_ps(10.0), k2 = _mm256_set1_ps(20.0) , k3 = _mm256_set1_ps(30.0), k4 = _mm256_set1_ps(40.0);
for (int i=0; i
.B8.3:: ; Preds .B8.3 .B8.2 vmulps ymm5, ymm3, YMMWORD PTR [rdx+rax*4] ;440.55 vmulps ymm6, ymm2, YMMWORD PTR [rdx+rax*4] ;440.75 vaddps ymm0, ymm5, ymm6 ;440.41 vmulps ymm5, ymm4, YMMWORD PTR [rdx+rax*4] ;441.55 vmulps ymm6, ymm1, YMMWORD PTR [rdx+rax*4] ;441.75 vaddps ymm5, ymm5, ymm6 ;441.41 vaddps ymm0, ymm0, ymm5 ;440.27 vmovups YMMWORD PTR [rcx+rax*4], ymm0 ;440.21 add rax, 8 ;437.25 cmp rax, r8 ;437.19 jl .B8.3 ; Prob 82% ;437.19
[/bash]
I was expecting a single move like vmovups ymm7, YMMWORD PTR [rdx+rax*4] at the start of the loop, then ymm7 used 4 times instead of 4 times a load, I'm missing something here?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it might be not always about register pressure but also about calling conventions and register usage on the different OS and 32/64bits binaries ...
on Linux and with icc version 12.1.0, I have , as you wanted to have:
[bash]..B2.3: # Preds ..B2.1 ..B2.3 vmovups (%rsi,%rax,4), %ymm6 vmulps %ymm6, %ymm3, %ymm4 vmulps %ymm6, %ymm2, %ymm5 vmulps %ymm6, %ymm1, %ymm7 vmulps %ymm6, %ymm0, %ymm8 vaddps %ymm5, %ymm4, %ymm9 vaddps %ymm8, %ymm7, %ymm10 vaddps %ymm10, %ymm9, %ymm11 vmovups %ymm11, (%rdi,%rax,4) addq $8, %rax cmpq %rdx, %rax jl ..B2.3 [/bash]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
At the moment I wasn't able to compare the timings with another variant since the compiler always generate the same ASMcode whatever compilation flagsI tried, if you have an idea of which flag may impact this (Windows version) I'll be really glad to know it. Unfortunately it isn't an option to modify the ASM dump by hand then to use it as an input to the assembler since the syntax isn't correct due to a bug with the labels (the Intel compiler isn't compatible with itself in this respect). So, for this test I'll have to resort to purely assembly code or inline assembly which I avoid like the plague since several years now.
>on Linux and with icc version 12.1.0, I have , as you wanted to have
thanks, very interesting, now that's even more strange, one of the two version should be better and should be used by both compilers if you ask me (I don't see how the different ABIs have an impact on this, when used in a critical loop), even if the timings are the same one version should have better code density and better power usage, well IMHO
Forthe Windowsvariant, Ican imagine thatthe CPU is smart enough to not reload4 timesthe data if thecache linewasn't modified by another thread but it will still need to probe the L1D cache which looks less power efficient, I'll love to see this commented by a CPU designer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Windows and ICC 12.0 (not very latest) version gives the following:
[bash];;; const __m256 x = _mm256_load_ps(src+i); 00036 c5 fc 10 2c 86 vmovups ymm5, YMMWORD PTR [esi+eax*4] $LN27: ;;; _mm256_store_ps(dst+i,_mm256_add_ps(_mm256_add_ps(_mm256_mul_ps(k1,x),_mm256_mul_ps(k2,x)), 0003b c5 dc 59 f5 vmulps ymm6, ymm4, ymm5 $LN28: 0003f c5 e4 59 fd vmulps ymm7, ymm3, ymm5 $LN29: 00043 c5 cc 58 c7 vaddps ymm0, ymm6, ymm7 $LN30: 00047 c5 ec 59 f5 vmulps ymm6, ymm2, ymm5 $LN31: 0004b c5 f4 59 ed vmulps ymm5, ymm1, ymm5 $LN32: 0004f c5 cc 58 fd vaddps ymm7, ymm6, ymm5 $LN33: 00053 c5 fc 58 c7 vaddps ymm0, ymm0, ymm7 $LN34: 00057 c5 fc 11 04 81 vmovups YMMWORD PTR [ecx+eax*4], ymm0 $LN35: 0005c 83 c0 08 add eax, 8 $LN36: 0005f 3b c2 cmp eax, edx $LN37: 00061 7c d3 jl .B2.3 ; Prob 82% [/bash]
I would encorage you to check the version of compiler you are currently using.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[bash]; mark_description "Intel C++ Intel 64 Compiler XE for applications running on Intel 64, Version 12.1.0.233 Build 20110"; ; mark_description "811"; ; mark_description "-c -Qvc10 -Qlocation,link,$(VCInstallDir)\bin\x86_amd64 -I..\do3d -nologo -W3 -MP -O2 -Ob2 -Oi -Ot -Qip -"; ; mark_description "Qftz -D WIN32 -D NDEBUG -D _LIB -D _CRT_SECURE_NO_WARNINGS -D USE_AVX -D PRODUCTIONX -EHs -EHc -MT -GS- -fp:"; ; mark_description "fast -Zc:wchar_t -Zc:forScope -Qrestrict -FAs -Fax64\AVX\ -Fox64\AVX\ -Fdx64\AVX\vc100.pdb -TP -QxAVX";[/bash]
above the top lines of the ASM dump
the VS 2010 About -> Copy Info exports the following:
Microsoft Visual Studio 2010
Version 10.0.40219.1 SP1Rel
Microsoft .NET Framework
Version 4.0.30319 SP1Rel
Installed Version: Professional
Microsoft Office Developer Tools 01018-169-2660007-70637
Microsoft Office Developer Tools
Microsoft Visual Basic 2010 01018-169-2660007-70637
Microsoft Visual Basic 2010
Microsoft Visual C# 2010 01018-169-2660007-70637
Microsoft Visual C# 2010
Microsoft Visual C++ 2010 01018-169-2660007-70637
Microsoft Visual C++ 2010
Microsoft Visual F# 2010 01018-169-2660007-70637
Microsoft Visual F# 2010
Microsoft Visual Studio 2010 Team Explorer 01018-169-2660007-70637
Microsoft Visual Studio 2010 Team Explorer
Microsoft Visual Web Developer 2010 01018-169-2660007-70637
Microsoft Visual Web Developer 2010
Crystal Reports Templates for Microsoft Visual Studio 2010
Crystal Reports Templates for Microsoft Visual Studio 2010
Hotfix for Microsoft Visual Studio 2010 Professional - ENU (KB2522890) KB2522890
This hotfix is for Microsoft Visual Studio 2010 Professional - ENU.
If you later install a more recent service pack, this hotfix will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/2522890.
Hotfix for Microsoft Visual Studio 2010 Professional - ENU (KB2529927) KB2529927
This hotfix is for Microsoft Visual Studio 2010 Professional - ENU.
If you later install a more recent service pack, this hotfix will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/2529927.
Hotfix for Microsoft Visual Studio 2010 Professional - ENU (KB2548139) KB2548139
This hotfix is for Microsoft Visual Studio 2010 Professional - ENU.
If you later install a more recent service pack, this hotfix will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/2548139.
Hotfix for Microsoft Visual Studio 2010 Professional - ENU (KB2549864) KB2549864
This hotfix is for Microsoft Visual Studio 2010 Professional - ENU.
If you later install a more recent service pack, this hotfix will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/2549864.
Hotfix for Microsoft Visual Studio 2010 Professional - ENU (KB2565057) KB2565057
This hotfix is for Microsoft Visual Studio 2010 Professional - ENU.
If you later install a more recent service pack, this hotfix will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/2565057.
Intel C++ Composer XE 2011 Update 6 Package ID: w_ccompxe_2011.6.233
Intel C++ Composer XE 2011 Update 6 Integration for Microsoft Visual Studio* 2010, Version 12.1.1095.2010, Copyright 2002-2011 Intel Corporation
* Other names and brands may be claimed as the property of others
This product includes software developed at The Apache Software Foundation (http://www.apache.org/).
Portions of this software were originally based on the following:
- software copyright (c) 1999, IBM Corporation., http://www.ibm.com.
- software copyright (c) 1999, Sun Microsystems., http://www.sun.com.
- the W3C consortium (http://www.w3c.org) ,
- the SAX project (http://www.saxproject.org)
- voluntary contributions made by Paul Eng on behalf of the Apache Software Foundation that were originally developed at iClick, Inc., software copyright (c) 1999.
This product includes updcrc macro, Satchell Evaluations and Chuck Forsberg. Copyright (C) 1986 Stephen Satchell.
This product includes software developed by the MX4J project (http://mx4j.sourceforge.net).
This product includes ICU 1.8.1 and later.Copyright (c) 1995-2006 International Business Machines Corporation and others.
Portions copyright (c) 1997-2007 Cypress Semiconductor Corporation. All rights reserved.
This product includes XORP. Copyright (c) 2001-2004 International Computer Science Institute
This product includes software from the book "Linux Device Drivers" by Alessandro Rubini and Jonathan Corbet, published by O'Reilly & Associates.
This product includes hashtab.c. Bob Jenkins, 1996.
Microsoft Visual Studio 2010 Professional - ENU Service Pack 1 (KB983509) KB983509
This service pack is for Microsoft Visual Studio 2010 Professional - ENU.
If you later install a more recent service pack, this service pack will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/983509.
Microsoft Visual Studio 2010 SharePoint Developer Tools 10.0.40219
Microsoft Visual Studio 2010 SharePoint Developer Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FYI I just tested with the latest release of the Intel XE 2011 (Intel C++ Intel 64 Compiler XE for applications running on Intel 64, Version 12.1.2.278 Build 20111) and I got the exact same ASM, out of curiosity I tried the /QxCORE-AVX2 option to use the FMA instructions, and there is also a lot of indexed addressing, I suppose it is in fact an optimization but I'll be very interested to learn of the CPU deal with this
[bash].B15.3:: ; Preds .B15.1 .B15.3 vmulps ymm4, ymm1, YMMWORD PTR [rdx+rax*4] ;41.75 vmulps ymm5, ymm0, YMMWORD PTR [rdx+rax*4] ;42.75 vfmadd231ps ymm4, ymm2, YMMWORD PTR [rdx+rax*4] ;41.41 vfmadd231ps ymm5, ymm3, YMMWORD PTR [rdx+rax*4] ;42.41 vaddps ymm4, ymm4, ymm5 ;41.27 vmovups YMMWORD PTR [rcx+rax*4], ymm4 ;41.21 add rax, 8 ;38.25 cmp rax, r8 ;38.19 jl .B15.3 ; Prob 82% ;38.19[/bash]
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page