<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic L1D Latency Breakdown in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/L1D-Latency-Breakdown/m-p/936437#M1754</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;It is usually stated that the L1D latency is around 4 cycles. How are those 4 cycles utilized?&lt;/P&gt;
&lt;P&gt;Is it:&lt;/P&gt;
&lt;P&gt;1c: Calculate effective address&lt;BR /&gt;1c: Send request from core to cache&lt;BR /&gt;1c: Do cache access&lt;BR /&gt;1c: Send data back to core pipeline&lt;/P&gt;
&lt;P&gt;Is there any available information on that?&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 19 Apr 2013 12:45:55 GMT</pubDate>
    <dc:creator>Andreas_S_2</dc:creator>
    <dc:date>2013-04-19T12:45:55Z</dc:date>
    <item>
      <title>L1D Latency Breakdown</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/L1D-Latency-Breakdown/m-p/936437#M1754</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;It is usually stated that the L1D latency is around 4 cycles. How are those 4 cycles utilized?&lt;/P&gt;
&lt;P&gt;Is it:&lt;/P&gt;
&lt;P&gt;1c: Calculate effective address&lt;BR /&gt;1c: Send request from core to cache&lt;BR /&gt;1c: Do cache access&lt;BR /&gt;1c: Send data back to core pipeline&lt;/P&gt;
&lt;P&gt;Is there any available information on that?&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 19 Apr 2013 12:45:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/L1D-Latency-Breakdown/m-p/936437#M1754</guid>
      <dc:creator>Andreas_S_2</dc:creator>
      <dc:date>2013-04-19T12:45:55Z</dc:date>
    </item>
    <item>
      <title>Hello Andreas,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/L1D-Latency-Breakdown/m-p/936438#M1755</link>
      <description>&lt;P&gt;Hello Andreas,&lt;/P&gt;
&lt;P&gt;The 4 cycle count is based on standard latency measurements. By 'standard' I mean load-to-use, dependent-chain and&amp;nbsp;the array fits in L1d. The prefetchers need to be disabled in the BIOS (if you have&amp;nbsp;use a&amp;nbsp;stride like 64 bytes&amp;nbsp;which the prefetchers can latch on to). Turbo mode needs to be disabled (again, you can do this from the bios) to get the 4 cycle count.&lt;/P&gt;
&lt;P&gt;So the 4 cycle count is 'load to use' so... from the time the load is issued until the data is received. The 'dependent load' means that the address of the next line to be read is contained in the current data being fetched.&lt;/P&gt;
&lt;P&gt;In my tools, I use latency kernels based on Calibrator from Stefan Manegold &lt;A href="http://www.cwi.nl/~manegold/"&gt;http://www.cwi.nl/~manegold/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Your 4 steps&amp;nbsp;correspond to the load-to-use scenario.&lt;/P&gt;
&lt;P&gt;Pat&lt;/P&gt;</description>
      <pubDate>Fri, 19 Apr 2013 14:03:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/L1D-Latency-Breakdown/m-p/936438#M1755</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2013-04-19T14:03:40Z</dc:date>
    </item>
  </channel>
</rss>

