<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic If you haven't looked into in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/No-speedup-with-SSE/m-p/1163728#M7106</link>
    <description>If you haven't looked into any details we can offer only guesses.
1) if you use icc to target those specific CPUs, optimization of C code for avx wide registers may be as good as SSE code which uses only half registers on the newer cpu.
2) the newer (but now also obsolete) CPU could run up against the 128 bit wide bandwidth limitation between L1 and L2 cache.

I remember tearing down and replacing motherboards and CPUs on those westmore E7 boxes.  Their prices were hardly justified even with the upgraded hardware.</description>
    <pubDate>Wed, 04 Apr 2018 12:22:26 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2018-04-04T12:22:26Z</dc:date>
    <item>
      <title>No speedup with SSE</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/No-speedup-with-SSE/m-p/1163727#M7105</link>
      <description>&lt;P&gt;I have optimized my finite difference code with SSE. On my workstation, the speed is almost doubled. But when&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;I run the same code on a cluster node, the code with SSE has almost the same performance with the non-optimized code. So what happened to the cluster node case? &lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;I guess there are few possible reasons. (1) I may have missed something during the compilation so the SSE does not work as expected. So I have tried to compile the code on my workstation, then run it on the cluster node, but it does not help. (2) On the cluster node, the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;non-optimized code has been automatically optimized some way so that it is already fast enough. (3) SSE is not supported by that cluster node.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;So, how could I figure out why?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;PS: Compiler: icc. The CPU on my workstation:&amp;nbsp;Intel(R) Xeon(R) CPU E7- 4830&amp;nbsp; @ 2.13GHz (SSE improves speed).&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;The CPU on the cluster node:&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;Intel(R) Xeon(R) CPU E5- 2670&amp;nbsp; @ 2.60GHz (SSE makes no difference).&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Apr 2018 00:53:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/No-speedup-with-SSE/m-p/1163727#M7105</guid>
      <dc:creator>Gong__Xufei</dc:creator>
      <dc:date>2018-04-03T00:53:13Z</dc:date>
    </item>
    <item>
      <title>If you haven't looked into</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/No-speedup-with-SSE/m-p/1163728#M7106</link>
      <description>If you haven't looked into any details we can offer only guesses.
1) if you use icc to target those specific CPUs, optimization of C code for avx wide registers may be as good as SSE code which uses only half registers on the newer cpu.
2) the newer (but now also obsolete) CPU could run up against the 128 bit wide bandwidth limitation between L1 and L2 cache.

I remember tearing down and replacing motherboards and CPUs on those westmore E7 boxes.  Their prices were hardly justified even with the upgraded hardware.</description>
      <pubDate>Wed, 04 Apr 2018 12:22:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/No-speedup-with-SSE/m-p/1163728#M7106</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2018-04-04T12:22:26Z</dc:date>
    </item>
  </channel>
</rss>

