topic Re: about OpenMP Critical ,data race in Intel® Moderncode for Parallel Architectures

about OpenMP Critical ,data race

zhangzhe65 — Wed, 01 Apr 2009 09:00:35 GMT

why?
code1
#include "stdafx.h"
#include "omp.h"
#define N 100000
int _tmain(int argc, _TCHAR* argv[])
{
int arx,ary;
int i,max_num_x=-1,max_num_y=-1;
for(i=0;i {
arx=i;
ary=N-i;
}
omp_set_num_threads(10);
#pragma omp parallel for
for(i=0;i {
//#pragma omp critical(max_arx)
if(arx>max_num_x)
max_num_x=arx;
//#pragma omp critical(max_ary)
if(ary>max_num_y)
max_num_y=ary;
}

printf("max_num_x=%d max_num_y=%d\n",max_num_x,max_num_y);
return 0;
}

and
code2
#include "stdafx.h"
#include "omp.h"
#define N 100000
int _tmain(int argc, _TCHAR* argv[])
{
int arx,ary;
int i,max_num_x=-1,max_num_y=-1;
for(i=0;i {
arx=i;
ary=N-i;
}
omp_set_num_threads(10);
#pragma omp parallel for
for(i=0;i {
#pragma omp critical(max_arx)
if(arx>max_num_x)
max_num_x=arx;
#pragma omp critical(max_ary)
if(ary>max_num_y)
max_num_y=ary;
}

printf("max_num_x=%d max_num_y=%d\n",max_num_x,max_num_y);
return 0;
}

please tell me why the results of the two codes are identical? I don't know why no add #pragma omp critical ,no data race too,in code1.

Re: about OpenMP Critical ,data race

TimP — Wed, 01 Apr 2009 13:12:51 GMT

It is possible that your compiler may choose atomic operations, even though you don't specify them, as ICL would do when you allow vectorization, or may optimize the loops away, as gcc would do. I am assuming there is no special implication to the use of a Microsoft C-like language, other than that you exclude the use of a standard compiler.

Re: about OpenMP Critical ,data race

jimdempseyatthecove — Wed, 01 Apr 2009 14:23:13 GMT

Asside from the issue that unless your system has more than 10 cores (hardware threads), you shouldn'trequest more threads than are available.

The parallel loop will divide up the range into number of threads chunks, in this case 10. The 1st thread into the loop gets 0:N/10, 2nd N/10+1:(N/10)*2, ....

The moment the 1st thread finds any element in ary, and inserts its max value, then all other threads (actually all threads in this case) will never find any other max for ary.

The moment the last thread finds the 1st element in its subsection for arx it will be a new max, then all other threads will never find any other max for arx. From then on, only the last thread will find a new max for arx on each subsequent iteration.

Therefore, only if one of your threads gets evicted (preempted) after finding a local max, but before setting the found max value, and if the eviction lasts longer than the run time for either the 1st or last thread as the case may be, will you then observe the incorrect result.

Jim Dempsey

Re: about OpenMP Critical ,data race

zhangzhe65 — Wed, 01 Apr 2009 15:32:27 GMT

Dear Mr. Jim Dempsey:

Thank you very much for your reply.

Re: about OpenMP Critical ,data race

zhangzhe65 — Wed, 01 Apr 2009 15:41:31 GMT

Thank for your reply

Quoting - tim18