<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Richard, in Intel® ISA Extensions</title>
    <link>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999823#M4824</link>
    <description>&lt;P&gt;Hi Richard,&lt;/P&gt;

&lt;P&gt;I have code mixing float and fixed point precision, and in that matter, mixing AVX and SSE offer a very good performance improvement that I cannot lost by moving backward to SSE4.1 only.&lt;BR /&gt;
	&lt;BR /&gt;
	The problem is that I try to have the same binary having a SSE4.1 optimized code path and a SSE4.1/AVX hybrid optimized code path (because of my customer requirement) which is tedious (its not my first problem with the compiler). FYI the hybrid code is 20% faster to give you an idea (when of course I fix all the penalties).&lt;/P&gt;

&lt;P&gt;And this allows me to notice that this /QxSSE4.1 /QxaAVX combination is very very much flawed... (I guess IPO wrongly mixes part of code that should be isolated...).&lt;/P&gt;

&lt;P&gt;Best Regards&lt;/P&gt;</description>
    <pubDate>Wed, 30 Apr 2014 14:14:00 GMT</pubDate>
    <dc:creator>emmanuel_attia</dc:creator>
    <dc:date>2014-04-30T14:14:00Z</dc:date>
    <item>
      <title>BUG: Poor hybrid SSE/AVX code generated</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999818#M4819</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I have a piece of code that I cannot disclose right now (I will try to reproduce it in a shorter example), the thing is when I compile it with /QAVX, it generate this code:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;Address	Source Line	Assembly	Clockticks: Total	Clockticks: Self	Instructions Retired: Total	Instructions Retired: Self	CPI Rate: Total	CPI Rate: Self	General Retirement	Microcode Sequencer	Bad Speculation	Back-end Bound	Front-end Bound	DTLB Overhead	Loads Blocked by Store Forwarding	Split Loads	4K Aliasing	L2 Bound	L3 Bound	DRAM Bound	Store Bound	Core Bound	ICache Misses	ITLB Overhead	Branch Resteers	DSB Switches	Length Changing Prefixes	Front-End Bandwidth DSB	Front-End Bandwidth MITE
0x100b931e	962	vmovdqu xmm5, xmmword ptr [ebx]	0.0%	4,983,341	0.0%	6,386,398	0.780	0.780	0.000	0.000	0.000	1.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
0x100b9322	962	vmovdqu xmmword ptr [esp+0xa0], xmm5	1.8%	554,297,692	1.4%	698,936,006	0.793	0.793	0.130	0.000	0.000	0.846	0.024	0.031	0.000	0.000	0.020	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.032	0.000
0x100b932b	962	vmovdqu xmm0, xmmword ptr [ebx+0x10]	0.1%	22,394,000	0.1%	28,464,010	0.787	0.787	0.000	0.000	0.000	1.000	0.000	0.000	0.000	0.000	0.322	0.000	0.000	0.000	0.000	0.001	0.000	0.000	0.000	0.000	0.000	0.000	0.000
0x100b9330	962	vmovdqu xmmword ptr [esp+0xb0], xmm0	0.0%	6,031,618	0.0%	8,129,718	0.742	0.742	0.000	0.000	0.000	1.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
0x100b9339	962	test esi, esi	0.0%	14,815,452	0.0%	16,931,739	0.875	0.875	0.000	0.000	0.000	1.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
0x100b933b	962	jz 0x100b941e &amp;lt;Block 50&amp;gt;	0.0%	1,904,153	0.0%	2,000,911	0.952	0.952	0.000	0.000	0.000	1.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
0x100b9341		Block 49:	0.0%		0.0%																								
0x100b9341	962	mov edi, dword ptr [esp+0x54]	0.0%		0.0%																								
0x100b9345	962	vpaddw xmm3, xmm5, xmm4	0.0%	3,352,093	0.0%	4,271,375	0.785	0.785	0.000	0.000	0.000	1.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
0x100b9349	962	vpaddw xmm2, xmm0, xmm4	0.0%	2,928,452	0.0%	3,751,111	0.781	0.781	0.000	0.000	0.000	1.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
0x100b934d	962	mov eax, dword ptr [edi+0x4]	0.0%	1,376,764	0.0%	1,987,314	0.693	0.693	0.000	0.000	0.000	1.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
0x100b9350	962	vpextrw edi, xmm3, 0x0	0.0%		0.0%																								
0x100b9355	962	vmovdqu xmm7, xmmword ptr [esp+0x80]	0.0%	4,592,970	0.0%	5,534,012	0.830	0.830	0.983	0.000	0.983	0.017	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
0x100b935e	962	vmovdqu xmm1, xmmword ptr [esp+0x90]	0.0%		0.0%																								
&lt;/PRE&gt;

&lt;P&gt;When I generate it with /QSSE4.1 /QaAVX, the AVX code path is as such:&lt;/P&gt;

&lt;P&gt;Address&amp;nbsp;&amp;nbsp; &amp;nbsp;Source Line&amp;nbsp;&amp;nbsp; &amp;nbsp;Assembly&amp;nbsp;&amp;nbsp; &amp;nbsp;Clockticks: Total&amp;nbsp;&amp;nbsp; &amp;nbsp;Clockticks: Self&amp;nbsp;&amp;nbsp; &amp;nbsp;Instructions Retired: Total&amp;nbsp;&amp;nbsp; &amp;nbsp;Instructions Retired: Self&amp;nbsp;&amp;nbsp; &amp;nbsp;CPI Rate: Total&amp;nbsp;&amp;nbsp; &amp;nbsp;CPI Rate: Self&amp;nbsp;&amp;nbsp; &amp;nbsp;General Retirement&amp;nbsp;&amp;nbsp; &amp;nbsp;Microcode Sequencer&amp;nbsp;&amp;nbsp; &amp;nbsp;Bad Speculation&amp;nbsp;&amp;nbsp; &amp;nbsp;Back-end Bound&amp;nbsp;&amp;nbsp; &amp;nbsp;Front-end Bound&amp;nbsp;&amp;nbsp; &amp;nbsp;DTLB Overhead&amp;nbsp;&amp;nbsp; &amp;nbsp;Loads Blocked by Store Forwarding&amp;nbsp;&amp;nbsp; &amp;nbsp;Split Loads&amp;nbsp;&amp;nbsp; &amp;nbsp;4K Aliasing&amp;nbsp;&amp;nbsp; &amp;nbsp;L2 Bound&amp;nbsp;&amp;nbsp; &amp;nbsp;L3 Bound&amp;nbsp;&amp;nbsp; &amp;nbsp;DRAM Bound&amp;nbsp;&amp;nbsp; &amp;nbsp;Store Bound&amp;nbsp;&amp;nbsp; &amp;nbsp;Core Bound&amp;nbsp;&amp;nbsp; &amp;nbsp;ICache Misses&amp;nbsp;&amp;nbsp; &amp;nbsp;ITLB Overhead&amp;nbsp;&amp;nbsp; &amp;nbsp;Branch Resteers&amp;nbsp;&amp;nbsp; &amp;nbsp;DSB Switches&amp;nbsp;&amp;nbsp; &amp;nbsp;Length Changing Prefixes&amp;nbsp;&amp;nbsp; &amp;nbsp;Front-End Bandwidth DSB&amp;nbsp;&amp;nbsp; &amp;nbsp;Front-End Bandwidth MITE&lt;BR /&gt;
	0x100b9594&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Block 47:&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	0x100b9594&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;mov dword ptr [esp+0x8], eax&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;2,106,301&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;2,485,774&amp;nbsp;&amp;nbsp; &amp;nbsp;0.847&amp;nbsp;&amp;nbsp; &amp;nbsp;0.847&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;1.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&lt;BR /&gt;
	0x100b9598&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;xor edx, edx&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	0x100b959a&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;mov esi, dword ptr [esp+0x28]&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	0x100b959e&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Block 48:&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	0x100b959e&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;movdqu xmm5, xmmword ptr [ebx]&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;6,178,264&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;7,860,224&amp;nbsp;&amp;nbsp; &amp;nbsp;0.786&amp;nbsp;&amp;nbsp; &amp;nbsp;0.786&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;1.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&lt;BR /&gt;
	0x100b95a2&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;movdqa xmmword ptr [esp+0xa0], xmm5&amp;nbsp;&amp;nbsp; &amp;nbsp;1.7%&amp;nbsp;&amp;nbsp; &amp;nbsp;518,966,716&amp;nbsp;&amp;nbsp; &amp;nbsp;1.3%&amp;nbsp;&amp;nbsp; &amp;nbsp;644,970,879&amp;nbsp;&amp;nbsp; &amp;nbsp;0.805&amp;nbsp;&amp;nbsp; &amp;nbsp;0.805&amp;nbsp;&amp;nbsp; &amp;nbsp;0.176&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.019&amp;nbsp;&amp;nbsp; &amp;nbsp;0.805&amp;nbsp;&amp;nbsp; &amp;nbsp;0.019&amp;nbsp;&amp;nbsp; &amp;nbsp;0.012&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.035&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.125&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&lt;BR /&gt;
	0x100b95ab&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;movdqu xmm0, xmmword ptr [ebx+0x10]&amp;nbsp;&amp;nbsp; &amp;nbsp;0.1%&amp;nbsp;&amp;nbsp; &amp;nbsp;18,849,972&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;22,961,024&amp;nbsp;&amp;nbsp; &amp;nbsp;0.821&amp;nbsp;&amp;nbsp; &amp;nbsp;0.821&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;1.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.239&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&lt;BR /&gt;
	0x100b95b0&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;movdqa xmmword ptr [esp+0xb0], xmm0&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;11,067,064&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;14,796,040&amp;nbsp;&amp;nbsp; &amp;nbsp;0.748&amp;nbsp;&amp;nbsp; &amp;nbsp;0.748&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;1.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.569&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&lt;BR /&gt;
	0x100b95b9&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;test esi, esi&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;13,519,245&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;16,668,546&amp;nbsp;&amp;nbsp; &amp;nbsp;0.811&amp;nbsp;&amp;nbsp; &amp;nbsp;0.811&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;1.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.333&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.266&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&lt;BR /&gt;
	0x100b95bb&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;jz 0x100b96a1 &amp;lt;Block 50&amp;gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;1,146,115&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;1,366,417&amp;nbsp;&amp;nbsp; &amp;nbsp;0.839&amp;nbsp;&amp;nbsp; &amp;nbsp;0.839&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;1.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&lt;BR /&gt;
	0x100b95c1&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Block 49:&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	0x100b95c1&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;mov edi, dword ptr [esp+0x54]&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;7,793,002&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;9,372,588&amp;nbsp;&amp;nbsp; &amp;nbsp;0.831&amp;nbsp;&amp;nbsp; &amp;nbsp;0.831&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;1.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;1.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&lt;BR /&gt;
	0x100b95c5&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;vzeroupper &amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	0x100b95c8&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;vpaddw xmm3, xmm5, xmm4&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;11,755,160&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;15,379,015&amp;nbsp;&amp;nbsp; &amp;nbsp;0.764&amp;nbsp;&amp;nbsp; &amp;nbsp;0.764&amp;nbsp;&amp;nbsp; &amp;nbsp;0.768&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.768&amp;nbsp;&amp;nbsp; &amp;nbsp;0.232&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;1.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&lt;BR /&gt;
	0x100b95cc&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;vpaddw xmm2, xmm0, xmm4&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;2,202,864&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;2,551,664&amp;nbsp;&amp;nbsp; &amp;nbsp;0.863&amp;nbsp;&amp;nbsp; &amp;nbsp;0.863&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;1.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&lt;BR /&gt;
	0x100b95d0&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;mov eax, dword ptr [edi+0x4]&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;1,229,030&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;1,534,467&amp;nbsp;&amp;nbsp; &amp;nbsp;0.801&amp;nbsp;&amp;nbsp; &amp;nbsp;0.801&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;1.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&lt;BR /&gt;
	0x100b95d3&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;vpextrw edi, xmm3, 0x0&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;4,683,935&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;6,584,619&amp;nbsp;&amp;nbsp; &amp;nbsp;0.711&amp;nbsp;&amp;nbsp; &amp;nbsp;0.711&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;1.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&lt;BR /&gt;
	0x100b95d8&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;vmovdqa xmm7, xmmword ptr [esp+0x80]&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;5,390,732&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;7,425,202&amp;nbsp;&amp;nbsp; &amp;nbsp;0.726&amp;nbsp;&amp;nbsp; &amp;nbsp;0.726&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;1.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&lt;BR /&gt;
	0x100b95e1&amp;nbsp;&amp;nbsp; &amp;nbsp;962&amp;nbsp;&amp;nbsp; &amp;nbsp;vmovdqa xmm1, xmmword ptr [esp+0x90]&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;4,387,615&amp;nbsp;&amp;nbsp; &amp;nbsp;0.0%&amp;nbsp;&amp;nbsp; &amp;nbsp;5,483,645&amp;nbsp;&amp;nbsp; &amp;nbsp;0.800&amp;nbsp;&amp;nbsp; &amp;nbsp;0.800&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;1.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&amp;nbsp;&amp;nbsp; &amp;nbsp;0.000&lt;/P&gt;

&lt;P&gt;&lt;BR /&gt;
	Note that I had to put manually the _mm256_zeroupper(), to workaround the huge penalty effected by this generated code.&lt;BR /&gt;
	I think in that case it should only generate VEX functions...&lt;/P&gt;

&lt;P&gt;I'll try to reproduce it, but you really should investigate into it.&lt;/P&gt;

&lt;P&gt;Best regards&lt;/P&gt;</description>
      <pubDate>Wed, 23 Apr 2014 12:37:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999818#M4819</guid>
      <dc:creator>emmanuel_attia</dc:creator>
      <dc:date>2014-04-23T12:37:51Z</dc:date>
    </item>
    <item>
      <title>Here is a code portion to</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999819#M4820</link>
      <description>&lt;P&gt;Here is a code portion to reproduce the supposed bug.&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;#include &amp;lt;immintrin.h&amp;gt;

typedef unsigned short int16_t;

__declspec(noinline)
static bool SomeFunction(int16_t * outputPtr, int16_t const * inputPtr)
{
    for (int x = 0; x &amp;lt; 16; x++)
    {
        __m256 input    = _mm256_loadu_ps((float *)(inputPtr + 16 * x));
        __m128i in_low  = _mm_castps_si128(_mm256_extractf128_ps(input, 0));
        __m128i in_high = _mm_castps_si128(_mm256_extractf128_ps(input, 1));

        __m128i out_low  = _mm_add_epi16(in_low, _mm_set1_epi16(1));
        __m128i out_high = _mm_add_epi16(in_low, _mm_set1_epi16(1));

        __m256 output = _mm256_insertf128_ps(_mm256_insertf128_ps(_mm256_undefined_ps(), _mm_castsi128_ps(out_low), 0), _mm_castsi128_ps(out_high), 1);
        _mm256_store_ps((float *)(outputPtr + 16 * x), output);
    }

    return true;
}

#include &amp;lt;stdlib.h&amp;gt;
#include &amp;lt;stdio.h&amp;gt;

int main(int argc, char ** argv)
{
    int16_t * dest = (int16_t *)_aligned_malloc(65536*sizeof(int16_t), 32);
    int16_t * src = (int16_t *)_aligned_malloc(65536*sizeof(int16_t), 32);

    // Prevent input optimisations
    for (int i = 0; i &amp;lt; 32768; i++)
        src&lt;I&gt; = rand();

    SomeFunction(dest, src);

    // Prevent output optimisations
    for (int i = 0; i &amp;lt; 32768; i++)
        printf("%d\n", dest&lt;I&gt;);
}&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;Here is the assembler code generated with /QxAVX for "SomeFunction":&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;01271090  push        esi  
01271091  sub         esp,18h  
01271094  xor         ecx,ecx  
01271096  vmovdqu     xmm0,xmmword ptr [___xi_z+34h (1273120h)]  
0127109E  mov         esi,eax  
012710A0  xor         eax,eax  
012710A2  vmovups     xmm1,xmmword ptr [eax+edx]  
012710A7  inc         ecx  
012710A8  vinsertf128 ymm2,ymm1,xmmword ptr [eax+edx+10h],1  
012710B0  vpaddw      xmm4,xmm2,xmm0  
012710B4  vinsertf128 ymm5,ymm4,xmm4,1  
012710BA  vmovups     ymmword ptr [eax+esi],ymm5  
012710BF  add         eax,20h  
012710C2  cmp         ecx,10h  
012710C5  jl          SomeFunction+12h (12710A2h)  
012710C7  vzeroupper  
012710CA  add         esp,18h  
012710CD  pop         esi  
012710CE  ret  &lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Now with /QxSSE4.1 /QaxAVX (in the AVX code path):&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;SomeFunction:
008D1090  push        esi  
008D1091  sub         esp,18h  
008D1094  xor         ecx,ecx  
008D1096  movdqa      xmm0,xmmword ptr [___xi_z+34h (8D3120h)]  
008D109E  mov         esi,eax  
008D10A0  xor         eax,eax  
008D10A2  vmovups     ymm1,ymmword ptr [eax+edx]  
008D10A7  inc         ecx  
008D10A8  vpaddw      xmm3,xmm1,xmm0  
008D10AC  vinsertf128 ymm4,ymm3,xmm3,1  
008D10B2  vmovaps     ymmword ptr [eax+esi],ymm4  
008D10B7  add         eax,20h  
008D10BA  cmp         ecx,10h  
008D10BD  jl          SomeFunction+12h (8D10A2h)  
008D10BF  add         esp,18h  
008D10C2  pop         esi  
008D10C3  ret &lt;/PRE&gt;

&lt;P&gt;There SHOULD not be a MOVDQA but a VMOVUPS (or VMOVDQA) when loading into register the constant "_mm_set1_epi16(1)"&lt;/P&gt;

&lt;P&gt;If i change SomeFunction by removing the for loop (no factorisation of the _mm_set1_epi16(1) needed), it gets normal again:&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;SomeFunction:
01101090  sub         esp,1Ch  
01101093  vmovups     ymm0,ymmword ptr [edx]  
01101097  vpaddw      xmm2,xmm0,xmmword ptr [___xi_z+34h (1103120h)]  
0110109F  vinsertf128 ymm3,ymm2,xmm2,1  
011010A5  vmovaps     ymmword ptr [eax],ymm3  
011010A9  add         esp,1Ch  
011010AC  ret  &lt;/PRE&gt;

&lt;P&gt;This bug is really annoying. Especially since /QaxAVX is mandatory when we want to mix SSE4 and AVX code in the same binary, since it's not possible with current versions of Intel Compiler to compile just one cpp file with /QxAVX without "border effects" (since it pollute with AVX every STL classes for instance or any shared classes between the 2 cpp).&lt;/P&gt;

&lt;P&gt;It took some time to reproduced it, so I hope it will be taked into consideration.&lt;/P&gt;

&lt;P&gt;Best Regards&lt;/P&gt;</description>
      <pubDate>Wed, 23 Apr 2014 17:02:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999819#M4820</guid>
      <dc:creator>emmanuel_attia</dc:creator>
      <dc:date>2014-04-23T17:02:00Z</dc:date>
    </item>
    <item>
      <title>And i almost forgot:</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999820#M4821</link>
      <description>&lt;P&gt;And i almost forgot:&lt;/P&gt;

&lt;P&gt;Workaround is to insert _mm256_zeroupper() depending on where the AVX and SSE instructions are emitted (so it is a little bit empirical....).&lt;/P&gt;

&lt;P&gt;But really, when using intrinsics and Intel Compiler, we shouldn't have to add these (plus it might throw away useful info in the register that the processor would have to reload).&lt;/P&gt;</description>
      <pubDate>Wed, 23 Apr 2014 17:05:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999820#M4821</guid>
      <dc:creator>emmanuel_attia</dc:creator>
      <dc:date>2014-04-23T17:05:00Z</dc:date>
    </item>
    <item>
      <title>Hello Emma,</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999821#M4822</link>
      <description>&lt;P&gt;Hello Emma,&lt;/P&gt;

&lt;P&gt;You are right .&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font: 12px/14.39px Arial, 宋体, Tahoma, Helvetica, sans-serif; color: rgb(83, 87, 94); text-transform: none; text-indent: 0px; letter-spacing: normal; word-spacing: 0px; float: none; display: inline !important; white-space: normal; font-size-adjust: none; font-stretch: normal; background-color: rgb(255, 255, 255); -webkit-text-stroke-width: 0px;"&gt;As to '&lt;/SPAN&gt;since it's not possible with current versions of Intel Compiler to compile just one cpp file with /QxAVX without "border effects" (since it pollute with AVX every STL classes for instance or any shared classes between the 2 cpp).&lt;SPAN style="font: 12px/14.39px Arial, 宋体, Tahoma, Helvetica, sans-serif; color: rgb(83, 87, 94); text-transform: none; text-indent: 0px; letter-spacing: normal; word-spacing: 0px; float: none; display: inline !important; white-space: normal; font-size-adjust: none; font-stretch: normal; background-color: rgb(255, 255, 255); -webkit-text-stroke-width: 0px;"&gt;'&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;When using -xavx or &lt;FONT size="2"&gt;/arch:AVX &lt;/FONT&gt; ,it is known that "A disadvantage of this method is that it requires access to the relevant source files, so it cannot avoid AVX-SSE transitions resulting from calls to functions that are not compiled with the –xavx or –mavx flag. Another possible disadvantage is that all Intel® SSE code within a file compiled with the –xavx or –mavx flag will be converted to VEX format and will only run on Intel® AVX supported processors."(&lt;A href="https://community.intel.com/legacyfs/online/drupal_files/m/d/4/1/d/8/11MC12_Avoiding_2BAVX-SSE_2BTransition_2BPenalties_2Brh_2Bfinal.pdf"&gt;https://software.intel.com/sites/default/files/m/d/4/1/d/8/11MC12_Avoiding_2BAVX-SSE_2BTransition_2BPenalties_2Brh_2Bfinal.pdf&lt;/A&gt;)&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font: 12px/14.39px Arial, 宋体, Tahoma, Helvetica, sans-serif; color: rgb(83, 87, 94); text-transform: none; text-indent: 0px; letter-spacing: normal; word-spacing: 0px; float: none; display: inline !important; white-space: normal; font-size-adjust: none; font-stretch: normal; background-color: rgb(255, 255, 255); -webkit-text-stroke-width: 0px;"&gt;The Intel compiler, when /arch:AVX is set so as to support AVX intrinsics, generates equivalent AVX-128 code from SSE intrinsics, so there should be no transition penalty. So '&lt;/SPAN&gt;There SHOULD not be a MOVDQA but a VMOVUPS &lt;SPAN style="font: 12px/14.39px Arial, 宋体, Tahoma, Helvetica, sans-serif; color: rgb(83, 87, 94); text-transform: none; text-indent: 0px; letter-spacing: normal; word-spacing: 0px; float: none; display: inline !important; white-space: normal; font-size-adjust: none; font-stretch: normal; background-color: rgb(255, 255, 255); -webkit-text-stroke-width: 0px;"&gt;' should be a compiler bug's failue to deal with this ,I will investigate this and will report a internal bug if it is confirmed.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thank you.&lt;BR /&gt;
	--&lt;BR /&gt;
	QIAOMIN.Q&lt;BR /&gt;
	Intel Developer Support&lt;BR /&gt;
	Please participate in our redesigned community support web site:&lt;/P&gt;

&lt;P&gt;User forums:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;A href="http://software.intel.com/en-us/forums/"&gt;http://software.intel.com/en-us/forums/&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 25 Apr 2014 03:11:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999821#M4822</guid>
      <dc:creator>QIAOMIN_Q_</dc:creator>
      <dc:date>2014-04-25T03:11:27Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999822#M4823</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I find your example code interesting, as I've kind of assumed, possibly wrongly, that its bad to mix SSE with AVX code.&lt;/P&gt;

&lt;P&gt;If I dont' have all the functionality I want in AVX, then I only use SSE on the whole piece of code, I never mix, precisely because of issues like the SSE performance penalty if you don't clear AVX upper regs.&lt;/P&gt;</description>
      <pubDate>Wed, 30 Apr 2014 12:08:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999822#M4823</guid>
      <dc:creator>Richard_Nutman</dc:creator>
      <dc:date>2014-04-30T12:08:10Z</dc:date>
    </item>
    <item>
      <title>Hi Richard,</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999823#M4824</link>
      <description>&lt;P&gt;Hi Richard,&lt;/P&gt;

&lt;P&gt;I have code mixing float and fixed point precision, and in that matter, mixing AVX and SSE offer a very good performance improvement that I cannot lost by moving backward to SSE4.1 only.&lt;BR /&gt;
	&lt;BR /&gt;
	The problem is that I try to have the same binary having a SSE4.1 optimized code path and a SSE4.1/AVX hybrid optimized code path (because of my customer requirement) which is tedious (its not my first problem with the compiler). FYI the hybrid code is 20% faster to give you an idea (when of course I fix all the penalties).&lt;/P&gt;

&lt;P&gt;And this allows me to notice that this /QxSSE4.1 /QxaAVX combination is very very much flawed... (I guess IPO wrongly mixes part of code that should be isolated...).&lt;/P&gt;

&lt;P&gt;Best Regards&lt;/P&gt;</description>
      <pubDate>Wed, 30 Apr 2014 14:14:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999823#M4824</guid>
      <dc:creator>emmanuel_attia</dc:creator>
      <dc:date>2014-04-30T14:14:00Z</dc:date>
    </item>
    <item>
      <title>I haven't worked with the</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999824#M4825</link>
      <description>I haven't worked with the multiple path build.  In a single path build (e.g. -QxCORE-AVX2) ICL translates SSE4 intrinsics automatically to AVX-128 and skips _mm256_zeroupper() when it becomes unnecessary.  

gcc doesn't translate to AVX and requires a block of SSE code to  be followed by
#ifdef __AVX__
_mm256_zeroupper();
#endif
to maintain performance.

I avoid IPO unless I have a case which benefits from it.

Intel 15.0 beta compiler and gcc 4.9 make more effective use of shuffles in translation of C and Fortran so have less need of intrinsics.</description>
      <pubDate>Wed, 30 Apr 2014 15:03:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999824#M4825</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2014-04-30T15:03:46Z</dc:date>
    </item>
    <item>
      <title>Hi Tim,</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999825#M4826</link>
      <description>&lt;P&gt;Hi Tim,&lt;/P&gt;

&lt;P&gt;This issue is specific to the fact that AVX is an alternate code path.&lt;/P&gt;

&lt;P&gt;Regards&lt;/P&gt;</description>
      <pubDate>Tue, 06 May 2014 12:51:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/BUG-Poor-hybrid-SSE-AVX-code-generated/m-p/999825#M4826</guid>
      <dc:creator>emmanuel_attia</dc:creator>
      <dc:date>2014-05-06T12:51:50Z</dc:date>
    </item>
  </channel>
</rss>

