<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Open MP slows down performance in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Open-MP-slows-down-performance/m-p/955707#M5252</link>
    <description>&lt;P&gt;Try placing the results of the MATMUL into a private (to thread)and declared storage area. I think what is happening is the compiler is allocating a common static temporary array to hold the results of the MATMUL and this temporary array is being used by both threads (resulting in cache problems).&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Thu, 29 Dec 2005 07:20:16 GMT</pubDate>
    <dc:creator>jim_dempsey</dc:creator>
    <dc:date>2005-12-29T07:20:16Z</dc:date>
    <item>
      <title>Open MP slows down performance</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Open-MP-slows-down-performance/m-p/955704#M5249</link>
      <description>&lt;DIV&gt;I am just starting to learn Open MP, and encountered the following problem.&lt;/DIV&gt;
&lt;DIV&gt;The next program (which does nothing useful) runs on dual core Pentium D about 6 times slower with /Qopenmp than without it. &lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt; program Cpu&lt;/DIV&gt;
&lt;DIV&gt; implicit none&lt;BR /&gt; integer, parameter :: n=200, nt=20,ni=64&lt;BR /&gt; real(8) :: a(n,n),c(ni),d&lt;BR /&gt; integer :: i,j&lt;BR /&gt; real t0,t1,ti(nt),ta,sd&lt;BR /&gt; &lt;BR /&gt; !$omp parallel num_threads(2),shared(c,ti)&lt;BR /&gt; !$omp do schedule(static),private(a,i,j,t1,t0) &lt;BR /&gt; do i=1,nt&lt;BR /&gt; call cpu_time(t0)&lt;BR /&gt; do j=1,ni&lt;BR /&gt; call random_number(a)&lt;BR /&gt; c(j)=sum(matmul(a,a))&lt;BR /&gt; end do&lt;BR /&gt; call cpu_time(t1)&lt;BR /&gt; ti(i)=t1-t0&lt;BR /&gt; end do&lt;BR /&gt; !$omp end do&lt;BR /&gt; !$omp end parallel&lt;BR /&gt; ta=sum(ti)/nt ! average run time&lt;BR /&gt; sd=dot_product(ti-ta,ti-ta)/(nt-1)&lt;BR /&gt; sd=sqrt(sd) ! standard deviation&lt;BR /&gt; print *,"ave=",ta," sdev=",sd," max=",maxval(ti)," min=",minval(ti)&lt;BR /&gt; end program Cpu&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Both processors are fully loaded, and yet instead of expected 2 times speedup I get a substantial slowdown. Looks like I am missing some crucial point, but what?&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Yuri.&lt;/DIV&gt;</description>
      <pubDate>Thu, 22 Dec 2005 09:38:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Open-MP-slows-down-performance/m-p/955704#M5249</guid>
      <dc:creator>izryu</dc:creator>
      <dc:date>2005-12-22T09:38:29Z</dc:date>
    </item>
    <item>
      <title>Re: Open MP slows down performance</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Open-MP-slows-down-performance/m-p/955705#M5250</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;P&gt;Hello Yuri,&lt;/P&gt;
&lt;P&gt;There's astorage conflicton array c inside the parallel region. Each thread has a private copy of the loop index j, but the threads can still access the same elements of c. This could be causing cache conflicts.&lt;/P&gt;
&lt;P&gt;I suggest analyzing this code with the &lt;A href="http://www.intel.com/cd/software/products/asmo-na/eng/threading/index.htm" target="_blank"&gt;Intel Threading Tools&lt;/A&gt;. Thread Checker will help you find race conditions. Thread Profiler will you tune threaded performance.&lt;/P&gt;
&lt;P&gt;Best regards,&lt;/P&gt;
&lt;P&gt;Henry&lt;/P&gt;
&lt;P&gt;Message Edited by hagabb on &lt;SPAN class="date_text"&gt;12-22-2005&lt;/SPAN&gt; &lt;SPAN class="time_text"&gt;07:28 AM&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Message Edited by hagabb on &lt;SPAN class="date_text"&gt;12-22-2005&lt;/SPAN&gt; &lt;SPAN class="time_text"&gt;07:30 AM&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Message Edited by hagabb on &lt;SPAN class="date_text"&gt;12-22-2005&lt;/SPAN&gt; &lt;SPAN class="time_text"&gt;07:31 AM&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Dec 2005 23:25:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Open-MP-slows-down-performance/m-p/955705#M5250</guid>
      <dc:creator>Henry_G_Intel</dc:creator>
      <dc:date>2005-12-22T23:25:13Z</dc:date>
    </item>
    <item>
      <title>Re: Open MP slows down performance</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Open-MP-slows-down-performance/m-p/955706#M5251</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Hello hagabb,&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Thanks for your suggestion. I have moved "c" from shared to private clause, the values of this array are thrown away in any case. Yet it didn't help,with /Qopenmp this program runs for 60 seconds, while withoutit only 17 seconds. I am totally baffled.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Yuri.&lt;/DIV&gt;&lt;P&gt;Message Edited by izryu on &lt;SPAN class="date_text"&gt;12-22-2005&lt;/SPAN&gt; &lt;SPAN class="time_text"&gt;06:37 PM&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Dec 2005 10:36:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Open-MP-slows-down-performance/m-p/955706#M5251</guid>
      <dc:creator>izryu</dc:creator>
      <dc:date>2005-12-23T10:36:12Z</dc:date>
    </item>
    <item>
      <title>Re: Open MP slows down performance</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Open-MP-slows-down-performance/m-p/955707#M5252</link>
      <description>&lt;P&gt;Try placing the results of the MATMUL into a private (to thread)and declared storage area. I think what is happening is the compiler is allocating a common static temporary array to hold the results of the MATMUL and this temporary array is being used by both threads (resulting in cache problems).&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 29 Dec 2005 07:20:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Open-MP-slows-down-performance/m-p/955707#M5252</guid>
      <dc:creator>jim_dempsey</dc:creator>
      <dc:date>2005-12-29T07:20:16Z</dc:date>
    </item>
  </channel>
</rss>

