<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Typically, setting the  in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/optimized-executables-for-specific-processors/m-p/1101705#M5196</link>
    <description>&lt;P&gt;Typically, setting the "highest" target ISA which works for all your machines may be a satisfactory tactic.&amp;nbsp; If you have both Sandy Bridge and Ivy Bridge machines (you didn't give full identification), it's unlikely there would be any advantage in setting the Ivy Bridge ISA.&amp;nbsp; The gain for AVX2 code (if you have a -v3 machine) is unlikely to exceed 5%, but you could test 2 builds on those machines in order to make your decision.&amp;nbsp; The gain for AVX2 over AVX may be wiped out if you use the multiple target path option (which would result in larger executable).&amp;nbsp; As you have both AVX-capable and non-AVX machines, compiling for the Nehalem box may give up some performance on the AVX machines, possibly enough that dual path options such as -axAVX -msse4.2 could prove advantageous.&lt;/P&gt;

&lt;P&gt;Taking advantage of compiler optimization reports plus run-time profiling using Intel Parallel Advisor ought to produce a clearer picture.&lt;/P&gt;</description>
    <pubDate>Thu, 23 Feb 2017 13:49:50 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2017-02-23T13:49:50Z</dc:date>
    <item>
      <title>optimized executables for specific processors</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/optimized-executables-for-specific-processors/m-p/1101704#M5195</link>
      <description>&lt;P&gt;We have four clusters composed of nodes of different vintage Intel Xeon processors.&lt;/P&gt;

&lt;P&gt;Intel(R) xeon(R) CPU E5-2697&lt;/P&gt;

&lt;P&gt;Intel(R) Xeon(R) E5-2690&lt;/P&gt;

&lt;P&gt;Intel(R) Xeon(R) x5675&lt;/P&gt;

&lt;P&gt;Intel(R) Xeon(R) e5530&lt;/P&gt;

&lt;P&gt;We are using 16U3 versions of the Intel ifort compiler.&lt;/P&gt;

&lt;P&gt;Are there compilation optimization parameters I should use, looking for ultimate performance, that would produce an executable best for each machine?&lt;/P&gt;

&lt;P&gt;Or is there one set of optimization parameters that should just as good an executable that I could compile on any machine and execute on another (or all) machines? &amp;nbsp;Again, we are looking for ultimate performance as opposed to portability as a prime concern.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Feb 2017 19:17:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/optimized-executables-for-specific-processors/m-p/1101704#M5195</guid>
      <dc:creator>Dave_K_</dc:creator>
      <dc:date>2017-02-22T19:17:04Z</dc:date>
    </item>
    <item>
      <title>Typically, setting the</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/optimized-executables-for-specific-processors/m-p/1101705#M5196</link>
      <description>&lt;P&gt;Typically, setting the "highest" target ISA which works for all your machines may be a satisfactory tactic.&amp;nbsp; If you have both Sandy Bridge and Ivy Bridge machines (you didn't give full identification), it's unlikely there would be any advantage in setting the Ivy Bridge ISA.&amp;nbsp; The gain for AVX2 code (if you have a -v3 machine) is unlikely to exceed 5%, but you could test 2 builds on those machines in order to make your decision.&amp;nbsp; The gain for AVX2 over AVX may be wiped out if you use the multiple target path option (which would result in larger executable).&amp;nbsp; As you have both AVX-capable and non-AVX machines, compiling for the Nehalem box may give up some performance on the AVX machines, possibly enough that dual path options such as -axAVX -msse4.2 could prove advantageous.&lt;/P&gt;

&lt;P&gt;Taking advantage of compiler optimization reports plus run-time profiling using Intel Parallel Advisor ought to produce a clearer picture.&lt;/P&gt;</description>
      <pubDate>Thu, 23 Feb 2017 13:49:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/optimized-executables-for-specific-processors/m-p/1101705#M5196</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2017-02-23T13:49:50Z</dc:date>
    </item>
    <item>
      <title>We use the run-time dispatch</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/optimized-executables-for-specific-processors/m-p/1101706#M5197</link>
      <description>&lt;P&gt;We use the run-time dispatch functionality and have never seen performance degradation relative to the native versions.&amp;nbsp;&amp;nbsp; The executable is larger, of course, because it contains multiple versions of any function that the compiler thinks will get a benefit from the "higher" ISA.&lt;/P&gt;

&lt;P&gt;Run-time dispatch is by function, so if you have code that spends a lot of time passing pointers to short functions as arguments, the overhead could be a problem.&amp;nbsp; I am not aware of any such codes in my shop (but some might be hiding inside interpreters or JITs).&lt;/P&gt;

&lt;P&gt;For the four processors above, the options "-xsse4.2 -axAVX" should generate the best code.&amp;nbsp; (The "-msse4.2" option will generate code that will run on both Intel and non-Intel processors, and may not be optimized as well as with the "-xsse4.2" option that generates code that will only run on Intel processors.)&lt;/P&gt;

&lt;P&gt;If either of the Xeon E5 systems are v2, v3, v4, then additional options might be helpful.&amp;nbsp; There is seldom a benefit to specialization for Xeon E5 v2, but Xeon E5 v3/v4 will want a third flag: "-axCORE-AVX2"&lt;/P&gt;</description>
      <pubDate>Thu, 23 Feb 2017 19:45:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/optimized-executables-for-specific-processors/m-p/1101706#M5197</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2017-02-23T19:45:53Z</dc:date>
    </item>
  </channel>
</rss>

