<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Sizing a threadpool for CPU-bound work in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Sizing-a-threadpool-for-CPU-bound-work/m-p/1402076#M8088</link>
    <description>&lt;P&gt;I'm looking for some insight into how to size a threadpool for CPU-bound work. &amp;nbsp;I'm profiling with a toy workload that does a tight xor loop and writes final 64-bit value to memory.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would have expected, (and it's true on other hardware) that the best approach would be 1 thread per physical core. &amp;nbsp;On my 12700K that would be 12 threads. &amp;nbsp;But task manager believes, and my benchmarking confirms, this does not fully utilize my CPU. &amp;nbsp;Empirically, the best size for an xor loop on my machine is 16-18 threads.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This puzzles me. &amp;nbsp;I don't understand why my ALU workload would benefit from hyperthreading where we don't really expect the thread to every yield. &amp;nbsp;And since the the goldilocks zone for my benchmark is neither the physical nor logical core count of the machine I'm not certain how to size the workload for CPUs I don't personally have. &amp;nbsp;For example, I could suppose my 16 threads is the average of physical and logical cores on the chip, and size the pool that way but I don't know if that's a plausible general rule across the whole range of 12th gen or earlier CPUs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 21 Jul 2022 02:01:55 GMT</pubDate>
    <dc:creator>dcadev</dc:creator>
    <dc:date>2022-07-21T02:01:55Z</dc:date>
    <item>
      <title>Sizing a threadpool for CPU-bound work</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Sizing-a-threadpool-for-CPU-bound-work/m-p/1402076#M8088</link>
      <description>&lt;P&gt;I'm looking for some insight into how to size a threadpool for CPU-bound work. &amp;nbsp;I'm profiling with a toy workload that does a tight xor loop and writes final 64-bit value to memory.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would have expected, (and it's true on other hardware) that the best approach would be 1 thread per physical core. &amp;nbsp;On my 12700K that would be 12 threads. &amp;nbsp;But task manager believes, and my benchmarking confirms, this does not fully utilize my CPU. &amp;nbsp;Empirically, the best size for an xor loop on my machine is 16-18 threads.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This puzzles me. &amp;nbsp;I don't understand why my ALU workload would benefit from hyperthreading where we don't really expect the thread to every yield. &amp;nbsp;And since the the goldilocks zone for my benchmark is neither the physical nor logical core count of the machine I'm not certain how to size the workload for CPUs I don't personally have. &amp;nbsp;For example, I could suppose my 16 threads is the average of physical and logical cores on the chip, and size the pool that way but I don't know if that's a plausible general rule across the whole range of 12th gen or earlier CPUs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jul 2022 02:01:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Sizing-a-threadpool-for-CPU-bound-work/m-p/1402076#M8088</guid>
      <dc:creator>dcadev</dc:creator>
      <dc:date>2022-07-21T02:01:55Z</dc:date>
    </item>
  </channel>
</rss>

