I tried to write a simple code to repeatedly compute greek pi by simulation and then compare the performance of the serial vs parallelized version of the code. To my great surprise the parallel code was slower ! Since I am a beginner I suspect I am not grasping some key aspects of parallel programming. Below I report the whole code. I am working with a version of an Intel 6700 processor with 4 cores.
I don't know if this forum is for this kind of questions, but thanks in advance for any help youc an give me.
! This program computes the value of greek pi "n" times using simulation
! Each time the computation is performed using "m" draws
! The computation is carried out by the subroutine "montec"
! In the end the average of th n simulations is computed and printed on screen
double precision greekpi(n),outp,avpi,den
double precision start_time,end_time
!$omp parallel private(i)
nthreads = omp_get_num_threads()
print*, 'number of threads',nthreads
!$omp do schedule(dynamic,chunk)
do i = 1,n
greekpi(i) = outp
outp = 0.0d0
!$omp end do
!$omp end parallel
print*, 'average value of greek pi'
den = n
avpi = sum(greekpi)/den
print*, 'running time'
print*, end_time - start_time
double precision sol
double precision xr1,xr2,yv(ndr),sumsq,totins,tot
totins = 0.0d0
do i = 1,ndr
sumsq = xr1**2.0d0 + xr2**2.0d0
if (sumsq.le.1.0d0) then
totins = totins + 1.0d0
tot = ndr
sol = totins/tot
sol = 4.0d0*sol
I have tried both to make "outp" private and change "dynamic" to "static", in the latter case both letting the computer set the size of each chunk to pass to a thread and setting it myself. Neither worked: the code is still runs much slower than the serial version.
Are you sure that the RANDOM_NUMBER function is thread-safe?
Most random number generators update some internal state after computing a new number. If this is protected by a lock, then the threads will have to process this function one at a time, and the overhead of handling the lock may be larger than the savings in any other parallel work.
RANDOM_NUMBER is thread-safe, however, in order to be thread-safe, it uses a critical section (serializing section). In cases like this what you do is call RANDOM_NUMBER outside the parallel region with an argument that is an array (not scalar). The size of the array would typically be the iteration count of the parallel loop that follows. Then within the parallel loop, to get the random number, you index the array with the loop index. Using the array (harvest) format of RANDOM_NUMBER your program crosses the critical region once as opposed to on each iteration.
Note, in your case you would include:
double precision harvest(ndr*2) ... call RANDOM_NUMBER(harvest) ... !$omp parallel ... !$omp do ... call montec(m,outp,harvest) ! add harvested array of random numbers ... subroutine montec(ndr,sol,harvest) ... double precision harvest(ndr*2) ... do i = 1,ndr xr1 = harvest((i-1)*2+1) xr2 = harvest((i-1)*2+2)