<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic L1D flush overhead in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/L1D-flush-overhead/m-p/1422386#M8118</link>
    <description>&lt;P&gt;Hello!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a rather niche question, I hope someone here can help me out!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have been looking for resources on the overhead flushing the L1D cache using the IA32_FLUSH_CMD MSR. I have found a couple benchmarks that measure the performance difference with L1D flushing enabled, but what I'm interested in is the actual execution time, or the cycles it takes, to perform the flush. I wasn't able to find any information on that online so far. Would really appreciate the help!&lt;/P&gt;</description>
    <pubDate>Sun, 16 Oct 2022 17:32:42 GMT</pubDate>
    <dc:creator>arngnr</dc:creator>
    <dc:date>2022-10-16T17:32:42Z</dc:date>
    <item>
      <title>L1D flush overhead</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/L1D-flush-overhead/m-p/1422386#M8118</link>
      <description>&lt;P&gt;Hello!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a rather niche question, I hope someone here can help me out!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have been looking for resources on the overhead flushing the L1D cache using the IA32_FLUSH_CMD MSR. I have found a couple benchmarks that measure the performance difference with L1D flushing enabled, but what I'm interested in is the actual execution time, or the cycles it takes, to perform the flush. I wasn't able to find any information on that online so far. Would really appreciate the help!&lt;/P&gt;</description>
      <pubDate>Sun, 16 Oct 2022 17:32:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/L1D-flush-overhead/m-p/1422386#M8118</guid>
      <dc:creator>arngnr</dc:creator>
      <dc:date>2022-10-16T17:32:42Z</dc:date>
    </item>
    <item>
      <title>Re: L1D flush overhead</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/L1D-flush-overhead/m-p/1423074#M8121</link>
      <description>&lt;P&gt;If the implementation of the state machine for invalidating the lines is good, then performance should depend primarily on the amount of dirty data in the L1 D cache. &amp;nbsp;The L1D_FLUSH operation is defined to be limited to the L1D cache, so dirty data has to be flushed to the L2. &amp;nbsp;The bandwidth between the L1D and L2 caches depends on the processor generation, but is 64 Bytes in Skylake Xeon and later cores. &amp;nbsp;For Skylake processors, the L1D is 32KiB or 512 cache lines, so the minimum time for writebacks is 512 core cycles is all data in the L1D is dirty. &amp;nbsp;This increases to 768 cycles for Ice Lake and Golden Cove cores (48KiB L1D). &amp;nbsp;&lt;/P&gt;
&lt;P&gt;Of course the writebacks could be slower than 1/cycle, but they should not be slower than 2 cycles each.&lt;/P&gt;
&lt;P&gt;Processing time for clean lines (invalidate only) is probably limited to 2 lines per cycle by the L1D tag access. &amp;nbsp;(This could be accelerated with magic hardware, but I would be surprised if it was worth it.)&lt;/P&gt;
&lt;P&gt;From my user-land perspective, the time will be dominated by crossing into the kernel to write the MSR.&lt;/P&gt;
&lt;P&gt;Of course I have not measured any of this, so I could be completely off base.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Oct 2022 19:50:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/L1D-flush-overhead/m-p/1423074#M8121</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2022-10-18T19:50:01Z</dc:date>
    </item>
  </channel>
</rss>

