<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: about OpenMP Critical  ,data race in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/about-OpenMP-Critical-data-race/m-p/868544#M2790</link>
    <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Asside from the issue that unless your system has more than 10 cores (hardware threads), you shouldn'trequest more threads than are available.&lt;BR /&gt;&lt;BR /&gt;The parallel loop will divide up the range into number of threads chunks, in this case 10. The 1st thread into the loop gets 0:N/10, 2nd N/10+1:(N/10)*2, ....&lt;BR /&gt;&lt;BR /&gt;The moment the 1st thread finds any element in ary&lt;I&gt;, and inserts its max value, then all other threads (actually all threads in this case) will never find any other max for ary&lt;I&gt;.&lt;BR /&gt;&lt;BR /&gt;The moment the last thread finds the 1st element in its subsection for arx&lt;I&gt; it will be a new max, then all other threads will never find any other max for arx&lt;I&gt;. From then on, only the last thread will find a new max for arx&lt;I&gt; on each subsequent iteration.&lt;BR /&gt;&lt;BR /&gt;Therefore, only if one of your threads gets evicted (preempted) after finding a local max, but before setting the found max value, and if the eviction lasts longer than the run time for either the 1st or last thread as the case may be, will you then observe the incorrect result.&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;</description>
    <pubDate>Wed, 01 Apr 2009 14:23:13 GMT</pubDate>
    <dc:creator>jimdempseyatthecove</dc:creator>
    <dc:date>2009-04-01T14:23:13Z</dc:date>
    <item>
      <title>about OpenMP Critical  ,data race</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/about-OpenMP-Critical-data-race/m-p/868542#M2788</link>
      <description>why?&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;code1&lt;/EM&gt;&lt;/STRONG&gt;&lt;BR /&gt;#include "stdafx.h"&lt;BR /&gt;#include "omp.h"&lt;BR /&gt;#define N 100000 &lt;BR /&gt;int _tmain(int argc, _TCHAR* argv[])&lt;BR /&gt;{&lt;BR /&gt; int arx&lt;N&gt;,ary&lt;N&gt;;&lt;BR /&gt; int i,max_num_x=-1,max_num_y=-1;&lt;BR /&gt; for(i=0;i&lt;N&gt;&lt;/N&gt; {&lt;BR /&gt; arx&lt;I&gt;=i;&lt;BR /&gt; ary&lt;I&gt;=N-i;&lt;BR /&gt; }&lt;BR /&gt; omp_set_num_threads(10);&lt;BR /&gt; #pragma omp parallel for&lt;BR /&gt; for(i=0;i&lt;N&gt;&lt;/N&gt; {&lt;BR /&gt; //#pragma omp critical(max_arx)&lt;BR /&gt; if(arx&lt;I&gt;&amp;gt;max_num_x)&lt;BR /&gt; max_num_x=arx&lt;I&gt;;&lt;BR /&gt; //#pragma omp critical(max_ary)&lt;BR /&gt; if(ary&lt;I&gt;&amp;gt;max_num_y)&lt;BR /&gt; max_num_y=ary&lt;I&gt;;&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt; printf("max_num_x=%d max_num_y=%d\n",max_num_x,max_num_y);&lt;BR /&gt; return 0;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;and&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;code2&lt;/EM&gt;&lt;/STRONG&gt;&lt;BR /&gt;#include "stdafx.h"&lt;BR /&gt;#include "omp.h"&lt;BR /&gt;#define N 100000 &lt;BR /&gt;int _tmain(int argc, _TCHAR* argv[])&lt;BR /&gt;{&lt;BR /&gt; int arx&lt;N&gt;,ary&lt;N&gt;;&lt;BR /&gt; int i,max_num_x=-1,max_num_y=-1;&lt;BR /&gt; for(i=0;i&lt;N&gt;&lt;/N&gt; {&lt;BR /&gt; arx&lt;I&gt;=i;&lt;BR /&gt; ary&lt;I&gt;=N-i;&lt;BR /&gt; }&lt;BR /&gt; omp_set_num_threads(10);&lt;BR /&gt; #pragma omp parallel for&lt;BR /&gt; for(i=0;i&lt;N&gt;&lt;/N&gt; {&lt;BR /&gt; #pragma omp critical(max_arx)&lt;BR /&gt; if(arx&lt;I&gt;&amp;gt;max_num_x)&lt;BR /&gt; max_num_x=arx&lt;I&gt;;&lt;BR /&gt; #pragma omp critical(max_ary)&lt;BR /&gt; if(ary&lt;I&gt;&amp;gt;max_num_y)&lt;BR /&gt; max_num_y=ary&lt;I&gt;;&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt; printf("max_num_x=%d max_num_y=%d\n",max_num_x,max_num_y);&lt;BR /&gt; return 0;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;please tell me why the results of the two codes are identical? I don't know why no add #pragma omp critical ,no data race too,in code1.&lt;!--[endif]--&gt; &lt;SPAN style="font-family: Verdana; font-size: 26pt;" lang="EN-US"&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/N&gt;&lt;/N&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/N&gt;&lt;/N&gt;</description>
      <pubDate>Wed, 01 Apr 2009 09:00:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/about-OpenMP-Critical-data-race/m-p/868542#M2788</guid>
      <dc:creator>zhangzhe65</dc:creator>
      <dc:date>2009-04-01T09:00:35Z</dc:date>
    </item>
    <item>
      <title>Re: about OpenMP Critical  ,data race</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/about-OpenMP-Critical-data-race/m-p/868543#M2789</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
It is possible that your compiler may choose atomic operations, even though you don't specify them, as ICL would do when you allow vectorization, or may optimize the loops away, as gcc would do. I am assuming there is no special implication to the use of a Microsoft C-like language, other than that you exclude the use of a standard compiler.&lt;BR /&gt;</description>
      <pubDate>Wed, 01 Apr 2009 13:12:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/about-OpenMP-Critical-data-race/m-p/868543#M2789</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-04-01T13:12:51Z</dc:date>
    </item>
    <item>
      <title>Re: about OpenMP Critical  ,data race</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/about-OpenMP-Critical-data-race/m-p/868544#M2790</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Asside from the issue that unless your system has more than 10 cores (hardware threads), you shouldn'trequest more threads than are available.&lt;BR /&gt;&lt;BR /&gt;The parallel loop will divide up the range into number of threads chunks, in this case 10. The 1st thread into the loop gets 0:N/10, 2nd N/10+1:(N/10)*2, ....&lt;BR /&gt;&lt;BR /&gt;The moment the 1st thread finds any element in ary&lt;I&gt;, and inserts its max value, then all other threads (actually all threads in this case) will never find any other max for ary&lt;I&gt;.&lt;BR /&gt;&lt;BR /&gt;The moment the last thread finds the 1st element in its subsection for arx&lt;I&gt; it will be a new max, then all other threads will never find any other max for arx&lt;I&gt;. From then on, only the last thread will find a new max for arx&lt;I&gt; on each subsequent iteration.&lt;BR /&gt;&lt;BR /&gt;Therefore, only if one of your threads gets evicted (preempted) after finding a local max, but before setting the found max value, and if the eviction lasts longer than the run time for either the 1st or last thread as the case may be, will you then observe the incorrect result.&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;</description>
      <pubDate>Wed, 01 Apr 2009 14:23:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/about-OpenMP-Critical-data-race/m-p/868544#M2790</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2009-04-01T14:23:13Z</dc:date>
    </item>
    <item>
      <title>Re: about OpenMP Critical  ,data race</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/about-OpenMP-Critical-data-race/m-p/868545#M2791</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;EM&gt;Dear Mr. Jim Dempsey:&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Thank you very much for your reply.&lt;/EM&gt;</description>
      <pubDate>Wed, 01 Apr 2009 15:32:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/about-OpenMP-Critical-data-race/m-p/868545#M2791</guid>
      <dc:creator>zhangzhe65</dc:creator>
      <dc:date>2009-04-01T15:32:27Z</dc:date>
    </item>
    <item>
      <title>Re: about OpenMP Critical  ,data race</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/about-OpenMP-Critical-data-race/m-p/868546#M2792</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Thank for your reply&lt;BR /&gt;&lt;BR /&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; It is possible that your compiler may choose atomic operations, even though you don't specify them, as ICL would do when you allow vectorization, or may optimize the loops away, as gcc would do. I am assuming there is no special implication to the use of a Microsoft C-like language, other than that you exclude the use of a standard compiler.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;</description>
      <pubDate>Wed, 01 Apr 2009 15:41:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/about-OpenMP-Critical-data-race/m-p/868546#M2792</guid>
      <dc:creator>zhangzhe65</dc:creator>
      <dc:date>2009-04-01T15:41:31Z</dc:date>
    </item>
  </channel>
</rss>

