<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic try playing with the env var in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/Regarding-sgemm-benchmarks-for-MIC-devices/m-p/1058927#M53211</link>
    <description>&lt;P&gt;try playing with the env var KMP_AFFINITY. If I set&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;export KMP_AFFINITY=balanced&lt;/PRE&gt;

&lt;P&gt;then I achieve 1730 SP GFLOPS/s and 840 DP GFLOPS/s on my 5110P (using sample dgemm.c code from Intel's website).&lt;/P&gt;

&lt;P&gt;With any other setting of KMP_AFFINITY performance drops to 360 DP GFLOPS or less.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 24 Aug 2015 22:17:00 GMT</pubDate>
    <dc:creator>JJK</dc:creator>
    <dc:date>2015-08-24T22:17:00Z</dc:date>
    <item>
      <title>Regarding sgemm benchmarks for MIC devices</title>
      <link>https://community.intel.com/t5/Software-Archive/Regarding-sgemm-benchmarks-for-MIC-devices/m-p/1058926#M53210</link>
      <description>&lt;P&gt;Hi Intel forums,&lt;/P&gt;

&lt;P&gt;I've had difficulty reproducing the performance reported on the following page:&lt;/P&gt;

&lt;P&gt;&lt;A href="https://www-ssl.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-sgemm-dgemm.html"&gt;https://www-ssl.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-sgemm-dgemm.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Using the mkl sgemm routine on my 3120 series Xeon Phi, I haven't even approached the 1.7 TFLOP/S level claimed above. The best performance I achieve is ~0.7 TFLOP/S. Presumably, this is because I don't fully understand the threading and vectorization APIs, and I'm not using them optimally. I was wondering if anyone knows where to find the source &amp;amp; environment details used for Intel's official benchmark. Maybe I could compare "correct" usage with my code to better understand the tools.&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Chris&lt;/P&gt;</description>
      <pubDate>Fri, 21 Aug 2015 20:04:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Regarding-sgemm-benchmarks-for-MIC-devices/m-p/1058926#M53210</guid>
      <dc:creator>Christopher_M_5</dc:creator>
      <dc:date>2015-08-21T20:04:36Z</dc:date>
    </item>
    <item>
      <title>try playing with the env var</title>
      <link>https://community.intel.com/t5/Software-Archive/Regarding-sgemm-benchmarks-for-MIC-devices/m-p/1058927#M53211</link>
      <description>&lt;P&gt;try playing with the env var KMP_AFFINITY. If I set&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;export KMP_AFFINITY=balanced&lt;/PRE&gt;

&lt;P&gt;then I achieve 1730 SP GFLOPS/s and 840 DP GFLOPS/s on my 5110P (using sample dgemm.c code from Intel's website).&lt;/P&gt;

&lt;P&gt;With any other setting of KMP_AFFINITY performance drops to 360 DP GFLOPS or less.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 24 Aug 2015 22:17:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Regarding-sgemm-benchmarks-for-MIC-devices/m-p/1058927#M53211</guid>
      <dc:creator>JJK</dc:creator>
      <dc:date>2015-08-24T22:17:00Z</dc:date>
    </item>
  </channel>
</rss>

