<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Quote:John D. McCalpin wrote: in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/Why-GotoBlas-has-so-low-efficiency-where-is-wrong-for-my-steps/m-p/1028415#M41047</link>
    <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;John D. McCalpin wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;A more appropriate forum might be:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;A href="https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring"&gt;https://software.intel.com/en-us/forums/software-tuning-performance-opti...&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;In this case the answer is easy -- the Nehalem target generates SSE2/3/4 code (128-bit SIMD vectors), while your Xeon E5-2670 (Sandy Bridge EP) processor requires AVX code (256-bit SIMD vectors) to achieve full speed.&amp;nbsp; So you are getting 150 GFLOPS out of a peak of 165 GFLOPS (using SSE code), which is about 91% of peak.&lt;/P&gt;

&lt;P&gt;The author of GotoBLAS worked at TACC (&lt;A href="http://www.tacc.utexas.edu/"&gt;http://www.tacc.utexas.edu/&lt;/A&gt;) when I started working at TACC in 1999.&amp;nbsp; He left for industry well before we received our first Xeon E5 (Sandy Bridge EP) processors, so it was never optimized for that target.&amp;nbsp; The OpenBLAS project (&lt;A href="http://www.openblas.net/"&gt;http://www.openblas.net/&lt;/A&gt;) added Sandy Bridge support, and is continuing to add support for Haswell and other newer processors.&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Thank you for your explanation. Finally I know the reason of low effiency.(My teacher will forgive me for such a low efficiency :) ) Thank you very much!&lt;/P&gt;</description>
    <pubDate>Fri, 17 Apr 2015 12:17:21 GMT</pubDate>
    <dc:creator>Rancho_L_</dc:creator>
    <dc:date>2015-04-17T12:17:21Z</dc:date>
    <item>
      <title>Why GotoBlas has so low efficiency(where is wrong for my steps)?</title>
      <link>https://community.intel.com/t5/Software-Archive/Why-GotoBlas-has-so-low-efficiency-where-is-wrong-for-my-steps/m-p/1028411#M41043</link>
      <description>&lt;P&gt;&amp;nbsp;I use GotoBlas and mpich to run hpl in the cluster(the cpu is Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz). I use two ways to compile GotoBlas:(1)make (2)make USE_THREAD=0 TARGET=NEHALEM. The library used in the makefile of hpl is libgoto.a. However, the two different ways of compiling GotoBlas all leads to a low efficiency of HPL results: only 150GFlops(the theorical peak is 330 GFlops). Do I have some mistakes in compiling GotoBlas? Thanks for your answer.&lt;/P&gt;</description>
      <pubDate>Thu, 16 Apr 2015 09:04:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Why-GotoBlas-has-so-low-efficiency-where-is-wrong-for-my-steps/m-p/1028411#M41043</guid>
      <dc:creator>Rancho_L_</dc:creator>
      <dc:date>2015-04-16T09:04:23Z</dc:date>
    </item>
    <item>
      <title>As you don't use Intel xeon</title>
      <link>https://community.intel.com/t5/Software-Archive/Why-GotoBlas-has-so-low-efficiency-where-is-wrong-for-my-steps/m-p/1028412#M41044</link>
      <description>&lt;P&gt;As you don't use Intel xeon phi, the subject isn't topical here.&lt;/P&gt;

&lt;P&gt;Did you see the advice to set core2 rather than Nehalem if you can't upgrade to openblas? The latter project would seem a better source for advice.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Apr 2015 09:20:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Why-GotoBlas-has-so-low-efficiency-where-is-wrong-for-my-steps/m-p/1028412#M41044</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2015-04-16T09:20:06Z</dc:date>
    </item>
    <item>
      <title>Quote:Tim Prince wrote:</title>
      <link>https://community.intel.com/t5/Software-Archive/Why-GotoBlas-has-so-low-efficiency-where-is-wrong-for-my-steps/m-p/1028413#M41045</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Tim Prince wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;As you don't use Intel xeon phi, the subject isn't topical here.&lt;/P&gt;

&lt;P&gt;Did you see the advice to set core2 rather than Nehalem if you can't upgrade to openblas? The latter project would seem a better source for advice.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Thanks for your answer. I know I asked the question in a wrong place, but I don't know where to find experts...I will try you method, thank you&lt;/P&gt;</description>
      <pubDate>Thu, 16 Apr 2015 09:24:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Why-GotoBlas-has-so-low-efficiency-where-is-wrong-for-my-steps/m-p/1028413#M41045</guid>
      <dc:creator>Rancho_L_</dc:creator>
      <dc:date>2015-04-16T09:24:50Z</dc:date>
    </item>
    <item>
      <title>A more appropriate forum</title>
      <link>https://community.intel.com/t5/Software-Archive/Why-GotoBlas-has-so-low-efficiency-where-is-wrong-for-my-steps/m-p/1028414#M41046</link>
      <description>&lt;P&gt;A more appropriate forum might be:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;A href="https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring" target="_blank"&gt;https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;In this case the answer is easy -- the Nehalem target generates SSE2/3/4 code (128-bit SIMD vectors), while your Xeon E5-2670 (Sandy Bridge EP) processor requires AVX code (256-bit SIMD vectors) to achieve full speed.&amp;nbsp; So you are getting 150 GFLOPS out of a peak of 165 GFLOPS (using SSE code), which is about 91% of peak.&lt;/P&gt;

&lt;P&gt;The author of GotoBLAS worked at TACC (http://www.tacc.utexas.edu/) when I started working at TACC in 1999.&amp;nbsp; He left for industry well before we received our first Xeon E5 (Sandy Bridge EP) processors, so it was never optimized for that target.&amp;nbsp; The OpenBLAS project (http://www.openblas.net/) added Sandy Bridge support, and is continuing to add support for Haswell and other newer processors.&lt;/P&gt;</description>
      <pubDate>Thu, 16 Apr 2015 12:31:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Why-GotoBlas-has-so-low-efficiency-where-is-wrong-for-my-steps/m-p/1028414#M41046</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2015-04-16T12:31:38Z</dc:date>
    </item>
    <item>
      <title>Quote:John D. McCalpin wrote:</title>
      <link>https://community.intel.com/t5/Software-Archive/Why-GotoBlas-has-so-low-efficiency-where-is-wrong-for-my-steps/m-p/1028415#M41047</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;John D. McCalpin wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;A more appropriate forum might be:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;A href="https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring"&gt;https://software.intel.com/en-us/forums/software-tuning-performance-opti...&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;In this case the answer is easy -- the Nehalem target generates SSE2/3/4 code (128-bit SIMD vectors), while your Xeon E5-2670 (Sandy Bridge EP) processor requires AVX code (256-bit SIMD vectors) to achieve full speed.&amp;nbsp; So you are getting 150 GFLOPS out of a peak of 165 GFLOPS (using SSE code), which is about 91% of peak.&lt;/P&gt;

&lt;P&gt;The author of GotoBLAS worked at TACC (&lt;A href="http://www.tacc.utexas.edu/"&gt;http://www.tacc.utexas.edu/&lt;/A&gt;) when I started working at TACC in 1999.&amp;nbsp; He left for industry well before we received our first Xeon E5 (Sandy Bridge EP) processors, so it was never optimized for that target.&amp;nbsp; The OpenBLAS project (&lt;A href="http://www.openblas.net/"&gt;http://www.openblas.net/&lt;/A&gt;) added Sandy Bridge support, and is continuing to add support for Haswell and other newer processors.&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Thank you for your explanation. Finally I know the reason of low effiency.(My teacher will forgive me for such a low efficiency :) ) Thank you very much!&lt;/P&gt;</description>
      <pubDate>Fri, 17 Apr 2015 12:17:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Why-GotoBlas-has-so-low-efficiency-where-is-wrong-for-my-steps/m-p/1028415#M41047</guid>
      <dc:creator>Rancho_L_</dc:creator>
      <dc:date>2015-04-17T12:17:21Z</dc:date>
    </item>
  </channel>
</rss>

