<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic As above, there is no reason in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135704#M78021</link>
    <description>&lt;P&gt;I am explicitly not going to tell you how to use KMP_AFFINITY=balanced, because&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;there is no reason to use it; as you are discovering it is hard to use and confusing.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;All of the interesting options are covered by the use of KNP_HW_SUBSET and KMP_AFFINITY={scatter,compact} in a way which is comprehensible and easier to get right.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Feb 2018 09:22:49 GMT</pubDate>
    <dc:creator>James_C_Intel2</dc:creator>
    <dc:date>2018-02-16T09:22:49Z</dc:date>
    <item>
      <title>Xeon Phi - Balanced vs Scatter - 64 or Less Threads</title>
      <link>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135699#M78016</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;

&lt;P&gt;If the number of threads to use are 32 or any number equal or less than 64. Then, how does balance and scatter thread affinity differ from each other?&lt;/P&gt;

&lt;P&gt;As per my analysis, these two will use same number of cores (1 thread per core). Not that balance will use 2 thread per core to keep sequential thread together, leading to only 16 cores?&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Feb 2018 17:34:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135699#M78016</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2018-02-07T17:34:43Z</dc:date>
    </item>
    <item>
      <title>If I recall correctly,</title>
      <link>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135700#M78017</link>
      <description>&lt;P&gt;If I recall correctly, Balanced and Scatter should be the same if you are only using one thread per core.&lt;/P&gt;

&lt;P&gt;If you are using more than one thread per core, Balanced and Scatter will have different layouts.&amp;nbsp; For example, on a 64-core part using 128 threads, Balanced will put threads 0&amp;amp;1 on core 0, 2&amp;amp;3 on core 1, etc, while Scatter will put threads 0&amp;amp;64 on core 0, 1&amp;amp;65 on core 1, etc.&lt;/P&gt;

&lt;P&gt;Neither of these schemes is easy to understand if the number of threads is not evenly divisible by the number of cores.&amp;nbsp; In such cases I find it much easier to use KMP_HW_SUBSET to force the allocation to be inside the requested subset of cores/threads.&amp;nbsp; On our 68-core Xeon Phi 7250 if I wanted to use 64 cores with different numbers of threads, I would use:&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;1 thread/core: KMP_HW_SUBSET=64c,1t&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; OMP_NUM_THREADS=64&amp;nbsp;&amp;nbsp;&amp;nbsp; KMP_AFFINITY=compact&lt;/LI&gt;
	&lt;LI&gt;2 threads/core: KMP_HW_SUBSET=64c,2t&amp;nbsp;&amp;nbsp; OMP_NUM_THREADS=128&amp;nbsp;&amp;nbsp; KMP_AFFINITY=compact&lt;/LI&gt;
	&lt;LI&gt;4 threads/core: KMP_HW_SUBSET=64c,3t&amp;nbsp;&amp;nbsp; OMP_NUM_THREADS=256&amp;nbsp;&amp;nbsp; KMP_AFFINITY=compact&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;These three schemes emulate what "Balanced" would do if it were run on a 64-core system.&lt;/P&gt;

&lt;P&gt;You should always add the "verbose" clause to KMP_AFFINITY to verify that the system did what you wanted....&lt;/P&gt;</description>
      <pubDate>Wed, 07 Feb 2018 18:37:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135700#M78017</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2018-02-07T18:37:00Z</dc:date>
    </item>
    <item>
      <title>As ever, John is on the money</title>
      <link>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135701#M78018</link>
      <description>&lt;P&gt;As ever, John is on the money. It is much easier to use &lt;A href="https://software.intel.com/en-us/node/694293"&gt;KMP_HW_SUBSET&lt;/A&gt; to limit the available resources and then play with "compact" or :scatter" affinity than to try to achieve good balance with KMP_AFFINITY=balanced&lt;/P&gt;

&lt;P&gt;One thing which John is doing, which I would not (and which has introduced a bug in his text above :-)) is that he is using OMP_NUM_THREADS as well as KMP_HW_SUBSET. I find it better not to use OMP_NUM_THREADS, since that gives you the opportunity to have a mismatch between the number of HW threads allocated and the number of software threads created. If you leave out OMP_NUM_THREADS and just use KMP_HW_SUBSET, the library's default behaviour of running one thread on each available logicalCPU will kick in, and you can't make a mistake like that in John's third line&amp;nbsp;&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;UL style="color: rgb(96, 96, 96); font-size: 13.008px;"&gt;
		&lt;LI&gt;4 threads/core: KMP_HW_SUBSET=64c,3t&amp;nbsp;&amp;nbsp; OMP_NUM_THREADS=256&amp;nbsp;&amp;nbsp; KMP_AFFINITY=compact&lt;/LI&gt;
	&lt;/UL&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;where there 3t was intended to be 4t, and he's running 256 threads on 192 logicalCPUs...&lt;/P&gt;

&lt;P&gt;So I'd just use&lt;/P&gt;

&lt;UL style="color: rgb(96, 96, 96); font-size: 13.008px;"&gt;
	&lt;LI&gt;1 thread/core:&amp;nbsp; &amp;nbsp;KMP_HW_SUBSET=64c,1t&amp;nbsp; &amp;nbsp; KMP_AFFINITY=compact&lt;/LI&gt;
	&lt;LI&gt;2 threads/core: KMP_HW_SUBSET=64c,2t&amp;nbsp; &amp;nbsp; KMP_AFFINITY=compact&lt;/LI&gt;
	&lt;LI&gt;4 threads/core: KMP_HW_SUBSET=64c,4t&amp;nbsp; &amp;nbsp; KMP_AFFINITY=compact&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;and then also try KMP_AFFINITY=scatter.&lt;/P&gt;

&lt;P&gt;This then makes it easier to experiment with scaling, simply by changing the number of cores you ask for. (as described in "&lt;A href="https://software.intel.com/en-us/blogs/2016/12/02/how-to-plot-openmp-scaling-results"&gt;How to Plot OpenMP Scaling Results&lt;/A&gt;").&lt;/P&gt;</description>
      <pubDate>Thu, 08 Feb 2018 08:27:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135701#M78018</guid>
      <dc:creator>James_C_Intel2</dc:creator>
      <dc:date>2018-02-08T08:27:56Z</dc:date>
    </item>
    <item>
      <title>Hurray for sharp eyes!  I</title>
      <link>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135702#M78019</link>
      <description>&lt;P&gt;Hurray for sharp eyes!&amp;nbsp; I knew I was going to make a mistake with those numbers, and I was right!&lt;/P&gt;

&lt;P&gt;I think one of the OMP placement directives typically gives the effect of "balanced", but the standard allows named distributions to have implementation-defined behavior, so I don't use them.&amp;nbsp;&amp;nbsp; Numbers work -- at least when you get them right.&amp;nbsp; ;-)&lt;/P&gt;</description>
      <pubDate>Thu, 08 Feb 2018 19:59:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135702#M78019</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2018-02-08T19:59:31Z</dc:date>
    </item>
    <item>
      <title>Hi John,</title>
      <link>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135703#M78020</link>
      <description>&lt;P&gt;Hi John,&lt;/P&gt;

&lt;P&gt;I am getting a bit confused with environment variables. As I understand, to run 128 threads as Scatter, Compact and Balanced, I need following environment variables:&lt;/P&gt;

&lt;P&gt;1) Scatter:&amp;nbsp; &amp;nbsp; &amp;nbsp;export OMP_NUM_THREADS=128 export KMP_AFFINITY=&lt;STRONG&gt;scatter&lt;/STRONG&gt;,granularity=fine&lt;BR /&gt;
	2) Compact:&amp;nbsp; export OMP_NUM_THREADS=128 export KMP_AFFINITY=&lt;STRONG&gt;compact&lt;/STRONG&gt;,granularity=fine&lt;BR /&gt;
	3) Balanced:&amp;nbsp; export OMP_NUM_THREADS=128 export KMP_AFFINITY=&lt;STRONG&gt;balanced&lt;/STRONG&gt;,granularity=fine&lt;/P&gt;

&lt;P&gt;After reading last paragraph in the documentation here&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;&lt;A href="https://software.intel.com/en-us/node/522518" target="_blank"&gt;https://software.intel.com/en-us/node/522518&lt;/A&gt;, I am confused whether (3) is correct or not? As I understand this documentation is pointing to Intel Xeon Phi KNC not Intel Xeon Phi KNL?&lt;/SPAN&gt;&lt;/P&gt;

&lt;DIV style="box-sizing: border-box; color: rgb(102, 102, 102); font-family: Arial, Tahoma, Helvetica, sans-serif; font-size: 13px;"&gt;
	&lt;BLOCKQUOTE&gt;
		&lt;P style="box-sizing: border-box; word-wrap: break-word; margin-bottom: 1em; line-height: 1.4; max-width: 100%; width: auto;"&gt;To set the&amp;nbsp;&lt;CODE style="box-sizing: border-box; font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; line-height: 1.6em;"&gt;balanced&lt;/CODE&gt;&amp;nbsp;affinity type for only the Intel® MIC Architecture environment, assign a specific prefix using the&amp;nbsp;&lt;CODE style="box-sizing: border-box; font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; line-height: 1.6em;"&gt;MIC_ENV_PREFIX=&lt;VAR class="varname" style="box-sizing: border-box; font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; line-height: 1.6em;"&gt;prefix&lt;/VAR&gt;&lt;/CODE&gt;&amp;nbsp;and then set&amp;nbsp;&lt;CODE style="box-sizing: border-box; font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; line-height: 1.6em;"&gt;&lt;VAR class="varname" style="box-sizing: border-box; font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; line-height: 1.6em;"&gt;prefix&lt;/VAR&gt;_KMP_AFFINITY&lt;/CODE&gt;&amp;nbsp;with&amp;nbsp;&lt;CODE style="box-sizing: border-box; font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; line-height: 1.6em;"&gt;balanced&lt;/CODE&gt;.&lt;/P&gt;
	&lt;/BLOCKQUOTE&gt;

	&lt;P style="box-sizing: border-box; word-wrap: break-word; margin-bottom: 1em; line-height: 1.4; max-width: 100%; width: auto;"&gt;Thanks.&lt;/P&gt;

	&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Fri, 16 Feb 2018 07:25:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135703#M78020</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2018-02-16T07:25:32Z</dc:date>
    </item>
    <item>
      <title>As above, there is no reason</title>
      <link>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135704#M78021</link>
      <description>&lt;P&gt;I am explicitly not going to tell you how to use KMP_AFFINITY=balanced, because&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;there is no reason to use it; as you are discovering it is hard to use and confusing.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;All of the interesting options are covered by the use of KNP_HW_SUBSET and KMP_AFFINITY={scatter,compact} in a way which is comprehensible and easier to get right.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Feb 2018 09:22:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135704#M78021</guid>
      <dc:creator>James_C_Intel2</dc:creator>
      <dc:date>2018-02-16T09:22:49Z</dc:date>
    </item>
    <item>
      <title>Aha!  I am not the only one</title>
      <link>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135705#M78022</link>
      <description>&lt;P&gt;Aha!&amp;nbsp; I am not the only one who can make mistakes!&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;s/KNP_HW_SUBSET/KMP_HW_SUBSET/&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Feb 2018 15:28:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135705#M78022</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2018-02-16T15:28:45Z</dc:date>
    </item>
    <item>
      <title>Hi James, John,</title>
      <link>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135706#M78023</link>
      <description>&lt;P&gt;Hi James, John,&lt;/P&gt;

&lt;P&gt;It is finally getting clear to me. I got confused as I wanted to use KMP_AFFINITY=balanced as I thought without which Balanced can't be achieved.&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;1) Scatter 128 threads:&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;2 threads/core: KMP_HW_SUBSET=64c,2t&amp;nbsp; &amp;nbsp; KMP_AFFINITY=scatter&lt;/SPAN&gt;&lt;BR /&gt;
		&lt;SPAN style="font-size: 13.008px;"&gt;2) Balanced 128 threads:&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;2 threads/core: KMP_HW_SUBSET=64c,2t&amp;nbsp; &amp;nbsp; KMP_AFFINITY=compact&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Why is there even KMP_AFFINITY=balanced option, it can be really confusing for new user. I expect it to do what (2) would do above, but it doesn't seem to be the case.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Thank you for &lt;/SPAN&gt;clarifying&lt;SPAN style="font-size: 1em;"&gt; this.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Feb 2018 18:10:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Xeon-Phi-Balanced-vs-Scatter-64-or-Less-Threads/m-p/1135706#M78023</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2018-02-16T18:10:03Z</dc:date>
    </item>
  </channel>
</rss>

