This one surprised us, during a workshop underway at PKU in Beijing. The pi program is a standard teaching intro to OpenMP; you get to use private variables, and a reduction (code included below, with the OpenMP pragma in bold).
Here's the puzzle: when "private(x)" is omitted, no race condition is observed (!). We tried several resolutions (both num_steps and number of threads), but the image stubbornly kept returning the same correct answer every time.
Recompile the bad code (no private variable) with gcc, and the race condition shows up as expected.
Context is compilerl_cproc_p_11.1.072, RHEL 5.4, 32-core NHM.
Is it possible the compiler is doing some auto-magic here, masking the race? Forty-eight students (and their instructor) remain intrigued.
#include //required for the get_wtime() api
long long num_steps = 100000000;
int main(int argc, char* argv)
double start, stop;
double x, pi, sum=0.0;
step = 1./(double)num_steps;
start = omp_get_wtime(); //uses openmp's timer api
#pragma omp parallel for private(x) reduction(+:sum)
for (i=0; i
x = (i + .5)*step;
sum = sum + 4.0/(1.+ x*x);
pi = sum*step;
stop = omp_get_wtime(); //uses openmp's timer api
printf("The value of PI is %15.12f\\n",pi);
printf("The time to calculate PI was %15.3f seconds\\n",stop - start);
It's likely that normal optimization will result in each thread getting one or more private register copies of x. Note that such optimization is the default for icc (where -g isn't set), but happens only when specified by a -O option for gcc. It should be well known that threading errors may be exposed or masked with optimization.
If you are interested in observing the race condition try
volatile double x, pi, sum=0.0;
I suggest you show your students the assembly code expansion of the Serial compilation with and without volatile. This is to show the code without volatile uses registered x and with volatile uses memory located x. Then show parallel with volatile and private(x)using locations for x.
The reason for this exercise is you do not want your students to falsely learn "volatile breaks code". Your real purpose was to illustrate proper usage of private (and reduction).
A method without using disassembly can be achieved by inserting into the top of the loop
if(x == 0.0) cout << &x << endl;
This would illustrate
same location for x when NOT using private(x) different location for x when using private(x)
Note, non-volatile compilation with cout code might move x out of register and back to memory.