about OpenMP Critical ,data race

zhangzhe65 · ‎04-01-2009

why?
code1
#include "stdafx.h"
#include "omp.h"
#define N 100000
int _tmain(int argc, _TCHAR* argv[])
{
int arx,ary;
int i,max_num_x=-1,max_num_y=-1;
for(i=0;i {
arx=i;
ary=N-i;
}
omp_set_num_threads(10);
#pragma omp parallel for
for(i=0;i {
//#pragma omp critical(max_arx)
if(arx>max_num_x)
max_num_x=arx;
//#pragma omp critical(max_ary)
if(ary>max_num_y)
max_num_y=ary;
}

printf("max_num_x=%d max_num_y=%d\n",max_num_x,max_num_y);
return 0;
}

and
code2
#include "stdafx.h"
#include "omp.h"
#define N 100000
int _tmain(int argc, _TCHAR* argv[])
{
int arx,ary;
int i,max_num_x=-1,max_num_y=-1;
for(i=0;i {
arx=i;
ary=N-i;
}
omp_set_num_threads(10);
#pragma omp parallel for
for(i=0;i {
#pragma omp critical(max_arx)
if(arx>max_num_x)
max_num_x=arx;
#pragma omp critical(max_ary)
if(ary>max_num_y)
max_num_y=ary;
}

printf("max_num_x=%d max_num_y=%d\n",max_num_x,max_num_y);
return 0;
}

please tell me why the results of the two codes are identical? I don't know why no add #pragma omp critical ,no data race too,in code1.

TimP · ‎04-01-2009

It is possible that your compiler may choose atomic operations, even though you don't specify them, as ICL would do when you allow vectorization, or may optimize the loops away, as gcc would do. I am assuming there is no special implication to the use of a Microsoft C-like language, other than that you exclude the use of a standard compiler.

jimdempseyatthecove · ‎04-01-2009

Asside from the issue that unless your system has more than 10 cores (hardware threads), you shouldn'trequest more threads than are available.

The parallel loop will divide up the range into number of threads chunks, in this case 10. The 1st thread into the loop gets 0:N/10, 2nd N/10+1:(N/10)*2, ....

The moment the 1st thread finds any element in ary, and inserts its max value, then all other threads (actually all threads in this case) will never find any other max for ary.

The moment the last thread finds the 1st element in its subsection for arx it will be a new max, then all other threads will never find any other max for arx. From then on, only the last thread will find a new max for arx on each subsequent iteration.

Therefore, only if one of your threads gets evicted (preempted) after finding a local max, but before setting the found max value, and if the eviction lasts longer than the run time for either the 1st or last thread as the case may be, will you then observe the incorrect result.

Jim Dempsey

zhangzhe65 · ‎04-01-2009

Dear Mr. Jim Dempsey:

Thank you very much for your reply.

zhangzhe65 · ‎04-01-2009

Thank for your reply

Quoting - tim18

It is possible that your compiler may choose atomic operations, even though you don't specify them, as ICL would do when you allow vectorization, or may optimize the loops away, as gcc would do. I am assuming there is no special implication to the use of a Microsoft C-like language, other than that you exclude the use of a standard compiler.