Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

race condition masked by icc 11.1?

This one surprised us, during a workshop underway at PKU in Beijing. The pi program is a standard teaching intro to OpenMP; you get to use private variables, and a reduction (code included below, with the OpenMP pragma in bold).

Here's the puzzle: when "private(x)" is omitted, no race condition is observed (!). We tried several resolutions (both num_steps and number of threads), but the image stubbornly kept returning the same correct answer every time.
Recompile the bad code (no private variable) with gcc, and the race condition shows up as expected.

Context is compilerl_cproc_p_11.1.072, RHEL 5.4, 32-core NHM.
Is it possible the compiler is doing some auto-magic here, masking the race? Forty-eight students (and their instructor) remain intrigued.

#include //required for the get_wtime() api
long long num_steps = 100000000;
double step;
int main(int argc, char* argv[])
double start, stop;
double x, pi, sum=0.0;
int i;
step = 1./(double)num_steps;
start = omp_get_wtime(); //uses openmp's timer api
#pragma omp parallel for private(x) reduction(+:sum)
for (i=0; i
x = (i + .5)*step;
sum = sum + 4.0/(1.+ x*x);
pi = sum*step;
stop = omp_get_wtime(); //uses openmp's timer api
printf("The value of PI is %15.12f\\n",pi);
printf("The time to calculate PI was %15.3f seconds\\n",stop - start);
return 0;
0 Kudos
3 Replies
Honored Contributor III
It's likely that normal optimization will result in each thread getting one or more private register copies of x. Note that such optimization is the default for icc (where -g isn't set), but happens only when specified by a -O option for gcc.
It should be well known that threading errors may be exposed or masked with optimization.
0 Kudos
Honored Contributor III
If you are interested in observing the race condition try

volatile double x, pi, sum=0.0;


I suggest you show your students the assembly code expansion of the Serial compilation with and without volatile. This is to show the code without volatile uses registered x and with volatile uses memory located x. Then show parallel with volatile and private(x)using locations for x.

The reason for this exercise is you do not want your students to falsely learn "volatile breaks code".
Your real purpose was to illustrate proper usage of private (and reduction).

A method without using disassembly can be achieved by inserting into the top of the loop

if(x == 0.0)
cout << &x << endl;

This would illustrate

same location for x when NOT using private(x)
different location for x when using private(x)

Note, non-volatile compilation with cout code might move x out of register and back to memory.

Jim Dempsey

0 Kudos
Jim -- excellent, thank you! We revisited the pi program in class today, covering the matter as you suggested; the cout trick was good for a quick "aha" (printf debugging lives on...).
0 Kudos