<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic But address size change (67H) in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/%C2%B5ops-and-nops-and-LCPs/m-p/1028642#M4211</link>
    <description>&lt;P&gt;But address size change (67H) for a register instruction seems to work: &lt;STRONG&gt;0m3.248s&lt;/STRONG&gt; vs&amp;nbsp;&lt;STRONG&gt;4.512s &amp;nbsp;&lt;/STRONG&gt;Any explanations or caveats?&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;%use&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;smartalign&lt;BR /&gt;
		ALIGNMODE&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;generic,16&lt;BR /&gt;
		BITS&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;64&lt;/P&gt;

	&lt;P&gt;section&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;.text&lt;/P&gt;

	&lt;P&gt;global _nop_test&lt;BR /&gt;
		_nop_test:&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;mov&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;RAX,1&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;sal&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;RAX,31&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;mov&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;RBX,0&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;mov&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;EDX,0&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;align&amp;nbsp;&amp;nbsp; &amp;nbsp;16&lt;BR /&gt;
		loop:&lt;BR /&gt;
		&amp;nbsp; &amp;nbsp; db&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;67h&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;movzx&amp;nbsp;&amp;nbsp; &amp;nbsp;EBP,BH&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;sal&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;RBX,8&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;lea&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;EBP,[EDX+8*EBP]&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;sub&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;RAX,1&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;align&amp;nbsp;&amp;nbsp; &amp;nbsp;16&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;jnz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;loop&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;ret&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;loop:&lt;BR /&gt;
		0000000000000020&amp;nbsp;&amp;nbsp; &amp;nbsp;670fb6ef &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;movzbl&amp;nbsp;&amp;nbsp; &amp;nbsp;%bh, %ebp&lt;BR /&gt;
		0000000000000024&amp;nbsp;&amp;nbsp; &amp;nbsp;48c1e308 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;shlq&amp;nbsp;&amp;nbsp; &amp;nbsp;$0x8, %rbx&lt;BR /&gt;
		0000000000000028&amp;nbsp;&amp;nbsp; &amp;nbsp;678d2cea &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;leal&amp;nbsp;&amp;nbsp; &amp;nbsp;(%edx,%ebp,8), %ebp&lt;BR /&gt;
		000000000000002c&amp;nbsp;&amp;nbsp; &amp;nbsp;4883e801 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;subq&amp;nbsp;&amp;nbsp; &amp;nbsp;$0x1, %rax&lt;BR /&gt;
		0000000000000030&amp;nbsp;&amp;nbsp; &amp;nbsp;75ee &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;jne&amp;nbsp;&amp;nbsp; &amp;nbsp;0x20&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;</description>
    <pubDate>Fri, 17 Apr 2015 02:54:00 GMT</pubDate>
    <dc:creator>Chris_S_3</dc:creator>
    <dc:date>2015-04-17T02:54:00Z</dc:date>
    <item>
      <title>µops and nops and LCPs</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/%C2%B5ops-and-nops-and-LCPs/m-p/1028640#M4209</link>
      <description>&lt;P&gt;This question is with respect to Sandy Bridge, Haswell, .... Intel microarchitectures with a&amp;nbsp;µop cache.&lt;/P&gt;

&lt;P&gt;Since the pre-decode unit fetches 16 byte blocks, NOPs are necessary for alignment purposes. It is better for basic blocks to start at a 16 byte address and it is better for instructions to not overlap 16 byte boundaries.&amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;But NOPs consume resources (Optimization Manual&amp;nbsp;&lt;/SPAN&gt;3.5.1.10)&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;. For example, XCHG EAX is decoded&amp;nbsp;and saved as a &lt;/SPAN&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;µop&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp;in the&amp;nbsp;µop cache. It is then eventually scheduled and retired.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;So it would seem that avoiding NOPs would be a worthy goal &lt;EM&gt;for code living in the&amp;nbsp;&lt;/EM&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;&lt;EM&gt;µop cache&lt;/EM&gt;. Less execution port pressure, ... Less is less.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Length changing prefixes (LCPs) can serve a similar alignment purpose. Indeed NOPs and LCPs are combined. However LCPs (not REX) suffer a penalty (3 cycle+) &amp;nbsp;in the decode units (3.4.2.3). It would seem that for code living in the&amp;nbsp;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;µop cache once that LCP stall penalty has been paid in full, that the savings would be one less NOP&amp;nbsp;µop.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;But then&amp;nbsp;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;3.4.2.3 also says:&lt;/SPAN&gt;&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;If the LCP stall happens in a tight loop, it can cause significant performance degradation&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;At this point my mental model is getting a headache. Still this could be a pre-Sandy Bridge admonition. After all it is assuming that an LCP stall is happening and this should not be the case for coding living in the&amp;nbsp;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;µop cache.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Q1: for code living in the&amp;nbsp;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;µop cache, loops etc, is it better to avoid an aligning NOP in favor an LCP for Sandy Bridge+?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;Q2: what happens with aligning nops which come after an uncondtional branch? Do they end up in the&amp;nbsp;µop cache consuming resources? Also, I see a lot of &amp;nbsp;&lt;/SPAN&gt;66666690&amp;nbsp;alignment code. Does this LCP NOP suffer unnecessary LCP stalls?&lt;/P&gt;

&lt;P&gt;BTW I'm aware of the general recommendations against LCPs but I'm wondering if anything changed with Sandy Bridge.&lt;/P&gt;</description>
      <pubDate>Thu, 16 Apr 2015 18:21:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/%C2%B5ops-and-nops-and-LCPs/m-p/1028640#M4209</guid>
      <dc:creator>Chris_S_3</dc:creator>
      <dc:date>2015-04-16T18:21:17Z</dc:date>
    </item>
    <item>
      <title>Well, that didn't work AT ALL</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/%C2%B5ops-and-nops-and-LCPs/m-p/1028641#M4210</link>
      <description>&lt;P&gt;Well now, that didn't work AT ALL like I'd expected.&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;extern void&amp;nbsp;&amp;nbsp; &amp;nbsp;nop_test();&lt;/P&gt;

	&lt;P&gt;int&lt;BR /&gt;
		main(int argc, char **argv)&lt;BR /&gt;
		{&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;nop_test();&lt;BR /&gt;
		}&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;global _nop_test&lt;BR /&gt;
		_nop_test:&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;mov&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;RAX,1&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;sal&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;RAX,36&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;align&amp;nbsp;&amp;nbsp; &amp;nbsp;16&lt;BR /&gt;
		loop:&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;db&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;66h&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;movzx&amp;nbsp;&amp;nbsp; &amp;nbsp;EBP,BH&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;shr&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;RBX,8&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;lea&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;EBP,[EDX+8*EBP]&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;sub&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;RAX,1&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;align&amp;nbsp;&amp;nbsp; &amp;nbsp;16&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;jnz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;loop&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;ret&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;On my Haswell Macbook Pro, with the 66H LCP this takes&amp;nbsp;1m6.817s and with a nop it takes&amp;nbsp;0m37.114s.&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;0000000000000010&amp;nbsp;&amp;nbsp; &amp;nbsp;660fb6ef &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;movzbw&amp;nbsp;&amp;nbsp; &amp;nbsp;%bh, %bp&lt;BR /&gt;
		0000000000000014&amp;nbsp;&amp;nbsp; &amp;nbsp;48c1eb08 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;shrq&amp;nbsp;&amp;nbsp; &amp;nbsp;$0x8, %rbx&lt;BR /&gt;
		0000000000000018&amp;nbsp;&amp;nbsp; &amp;nbsp;678d2cea &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;leal&amp;nbsp;&amp;nbsp; &amp;nbsp;(%edx,%ebp,8), %ebp&lt;BR /&gt;
		000000000000001c&amp;nbsp;&amp;nbsp; &amp;nbsp;4883e801 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;subq&amp;nbsp;&amp;nbsp; &amp;nbsp;$0x1, %rax&lt;BR /&gt;
		0000000000000020&amp;nbsp;&amp;nbsp; &amp;nbsp;75ee &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;jne&amp;nbsp;&amp;nbsp; &amp;nbsp;0x10&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;vs&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;0000000000000010&amp;nbsp;&amp;nbsp; &amp;nbsp;0fb6ef &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;movzbl&amp;nbsp;&amp;nbsp; &amp;nbsp;%bh, %ebp&lt;BR /&gt;
		0000000000000013&amp;nbsp;&amp;nbsp; &amp;nbsp;48c1eb08 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;shrq&amp;nbsp;&amp;nbsp; &amp;nbsp;$0x8, %rbx&lt;BR /&gt;
		0000000000000017&amp;nbsp;&amp;nbsp; &amp;nbsp;678d2cea &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;leal&amp;nbsp;&amp;nbsp; &amp;nbsp;(%edx,%ebp,8), %ebp&lt;BR /&gt;
		000000000000001b&amp;nbsp;&amp;nbsp; &amp;nbsp;4883e801 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;subq&amp;nbsp;&amp;nbsp; &amp;nbsp;$0x1, %rax&lt;BR /&gt;
		000000000000001f&amp;nbsp;&amp;nbsp; &amp;nbsp;90 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;nop&lt;BR /&gt;
		0000000000000020&amp;nbsp;&amp;nbsp; &amp;nbsp;75ee &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;jne&amp;nbsp;&amp;nbsp; &amp;nbsp;0x10&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;</description>
      <pubDate>Fri, 17 Apr 2015 02:01:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/%C2%B5ops-and-nops-and-LCPs/m-p/1028641#M4210</guid>
      <dc:creator>Chris_S_3</dc:creator>
      <dc:date>2015-04-17T02:01:00Z</dc:date>
    </item>
    <item>
      <title>But address size change (67H)</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/%C2%B5ops-and-nops-and-LCPs/m-p/1028642#M4211</link>
      <description>&lt;P&gt;But address size change (67H) for a register instruction seems to work: &lt;STRONG&gt;0m3.248s&lt;/STRONG&gt; vs&amp;nbsp;&lt;STRONG&gt;4.512s &amp;nbsp;&lt;/STRONG&gt;Any explanations or caveats?&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;%use&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;smartalign&lt;BR /&gt;
		ALIGNMODE&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;generic,16&lt;BR /&gt;
		BITS&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;64&lt;/P&gt;

	&lt;P&gt;section&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;.text&lt;/P&gt;

	&lt;P&gt;global _nop_test&lt;BR /&gt;
		_nop_test:&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;mov&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;RAX,1&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;sal&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;RAX,31&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;mov&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;RBX,0&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;mov&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;EDX,0&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;align&amp;nbsp;&amp;nbsp; &amp;nbsp;16&lt;BR /&gt;
		loop:&lt;BR /&gt;
		&amp;nbsp; &amp;nbsp; db&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;67h&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;movzx&amp;nbsp;&amp;nbsp; &amp;nbsp;EBP,BH&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;sal&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;RBX,8&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;lea&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;EBP,[EDX+8*EBP]&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;sub&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;RAX,1&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;align&amp;nbsp;&amp;nbsp; &amp;nbsp;16&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;jnz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;loop&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;ret&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;loop:&lt;BR /&gt;
		0000000000000020&amp;nbsp;&amp;nbsp; &amp;nbsp;670fb6ef &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;movzbl&amp;nbsp;&amp;nbsp; &amp;nbsp;%bh, %ebp&lt;BR /&gt;
		0000000000000024&amp;nbsp;&amp;nbsp; &amp;nbsp;48c1e308 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;shlq&amp;nbsp;&amp;nbsp; &amp;nbsp;$0x8, %rbx&lt;BR /&gt;
		0000000000000028&amp;nbsp;&amp;nbsp; &amp;nbsp;678d2cea &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;leal&amp;nbsp;&amp;nbsp; &amp;nbsp;(%edx,%ebp,8), %ebp&lt;BR /&gt;
		000000000000002c&amp;nbsp;&amp;nbsp; &amp;nbsp;4883e801 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;subq&amp;nbsp;&amp;nbsp; &amp;nbsp;$0x1, %rax&lt;BR /&gt;
		0000000000000030&amp;nbsp;&amp;nbsp; &amp;nbsp;75ee &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;jne&amp;nbsp;&amp;nbsp; &amp;nbsp;0x20&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;</description>
      <pubDate>Fri, 17 Apr 2015 02:54:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/%C2%B5ops-and-nops-and-LCPs/m-p/1028642#M4211</guid>
      <dc:creator>Chris_S_3</dc:creator>
      <dc:date>2015-04-17T02:54:00Z</dc:date>
    </item>
  </channel>
</rss>

