<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Clarification: Sandy Bridge Load Latency in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Clarification-Sandy-Bridge-Load-Latency/m-p/967914#M2730</link>
    <description>&lt;P&gt;I'm confused by a passage in the Intel Architecture Optimization Manual about load latencies:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;2.2.5.2 L1 DCache - Loads&lt;/P&gt;
&lt;P&gt;The common load latency is five cycles. When using a simple addressing mode, base plus offset&lt;/P&gt;
&lt;P&gt;that is smaller than 2048, the load latency can be four cycles.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Table 2.12&lt;/P&gt;
&lt;P&gt;Data Type/Addressing Mode &amp;nbsp; &amp;nbsp;Base + Offset &amp;gt; 2048; &amp;nbsp;&amp;nbsp;Base + Offset &amp;lt; 2048&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Base + Index [+ Offset] &amp;nbsp;&lt;/P&gt;
&lt;P&gt;Integer &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;5 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;4&lt;/P&gt;
&lt;P&gt;MMX, SSE, 128-bit AVX &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;6 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5&lt;/P&gt;
&lt;P&gt;X87 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;7 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;6&lt;/P&gt;
&lt;P&gt;256-bit AVX &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;7 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;7&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I'm not sure how to interpret this. &amp;nbsp;Adding some parentheses for clarity, is the faster case ((Base + Offset) &amp;lt; 2048), a condition that user code is unlikely to achieve, or (Base + (Offset &amp;lt; 2048)), something that can often be accomodated? &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 28 Oct 2013 22:29:59 GMT</pubDate>
    <dc:creator>Nathan_K_3</dc:creator>
    <dc:date>2013-10-28T22:29:59Z</dc:date>
    <item>
      <title>Clarification: Sandy Bridge Load Latency</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Clarification-Sandy-Bridge-Load-Latency/m-p/967914#M2730</link>
      <description>&lt;P&gt;I'm confused by a passage in the Intel Architecture Optimization Manual about load latencies:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;2.2.5.2 L1 DCache - Loads&lt;/P&gt;
&lt;P&gt;The common load latency is five cycles. When using a simple addressing mode, base plus offset&lt;/P&gt;
&lt;P&gt;that is smaller than 2048, the load latency can be four cycles.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Table 2.12&lt;/P&gt;
&lt;P&gt;Data Type/Addressing Mode &amp;nbsp; &amp;nbsp;Base + Offset &amp;gt; 2048; &amp;nbsp;&amp;nbsp;Base + Offset &amp;lt; 2048&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Base + Index [+ Offset] &amp;nbsp;&lt;/P&gt;
&lt;P&gt;Integer &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;5 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;4&lt;/P&gt;
&lt;P&gt;MMX, SSE, 128-bit AVX &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;6 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5&lt;/P&gt;
&lt;P&gt;X87 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;7 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;6&lt;/P&gt;
&lt;P&gt;256-bit AVX &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;7 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;7&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I'm not sure how to interpret this. &amp;nbsp;Adding some parentheses for clarity, is the faster case ((Base + Offset) &amp;lt; 2048), a condition that user code is unlikely to achieve, or (Base + (Offset &amp;lt; 2048)), something that can often be accomodated? &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Oct 2013 22:29:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Clarification-Sandy-Bridge-Load-Latency/m-p/967914#M2730</guid>
      <dc:creator>Nathan_K_3</dc:creator>
      <dc:date>2013-10-28T22:29:59Z</dc:date>
    </item>
    <item>
      <title>Hello Nathan,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Clarification-Sandy-Bridge-Load-Latency/m-p/967915#M2731</link>
      <description>&lt;P&gt;Hello Nathan,&lt;/P&gt;

&lt;P&gt;Sorry to take so long to reply... end/start of year deadlines, etc.&lt;/P&gt;

&lt;P&gt;I think this section of the manual is differentiating instructions like 'mov edx,[eax+4]' and 'mov edx,[eax+4096]'. The +4 case (displacement==4) should load in 4 clocks. The +4096 case (so the displacement is 4096) should load in 5 cycles.&lt;/P&gt;

&lt;P&gt;Hope this helps... someone... if it is not too late for you.&lt;/P&gt;

&lt;P&gt;Pat&lt;/P&gt;</description>
      <pubDate>Fri, 24 Jan 2014 16:36:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Clarification-Sandy-Bridge-Load-Latency/m-p/967915#M2731</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2014-01-24T16:36:38Z</dc:date>
    </item>
  </channel>
</rss>

