- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried to write a simple code to repeatedly compute greek pi by simulation and then compare the performance of the serial vs parallelized version of the code. To my great surprise the parallel code was slower ! Since I am a beginner I suspect I am not grasping some key aspects of parallel programming. Below I report the whole code. I am working with a version of an Intel 6700 processor with 4 cores.
I don't know if this forum is for this kind of questions, but thanks in advance for any help youc an give me.
PROGRAM:
program pigreco
! This program computes the value of greek pi "n" times using simulation
! Each time the computation is performed using "m" draws
! The computation is carried out by the subroutine "montec"
! In the end the average of th n simulations is computed and printed on screen
implicit none
integer i,n,m
parameter(n=3200,m=250000)
double precision greekpi(n),outp,avpi,den
double precision start_time,end_time
integer chunk,nthreads,omp_get_num_threads
parameter (chunk=400)
call CPU_TIME(start_time)
!$omp parallel private(i)
nthreads = omp_get_num_threads()
print*, 'number of threads',nthreads
!$omp do schedule(dynamic,chunk)
do i = 1,n
call montec(m,outp)
greekpi(i) = outp
outp = 0.0d0
!print*, i,greekpi(i)
end do
!$omp end do
!$omp end parallel
call CPU_TIME(end_time)
print*, 'average value of greek pi'
den = n
avpi = sum(greekpi)/den
print*, avpi
print*, 'running time'
print*, end_time - start_time
end program
subroutine montec(ndr,sol)
implicit none
integer ndr
double precision sol
integer i
double precision xr1,xr2,yv(ndr),sumsq,totins,tot
totins = 0.0d0
do i = 1,ndr
call RANDOM_NUMBER(xr1)
call RANDOM_NUMBER(xr2)
sumsq = xr1**2.0d0 + xr2**2.0d0
if (sumsq.le.1.0d0) then
totins = totins + 1.0d0
end if
end do
tot = ndr
sol = totins/tot
sol = 4.0d0*sol
return
end subroutine
- Tags:
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Variable "outp" should be private.
Not sure dynamic,400 is a good idea. Start with default "static".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have tried both to make "outp" private and change "dynamic" to "static", in the latter case both letting the computer set the size of each chunk to pass to a thread and setting it myself. Neither worked: the code is still runs much slower than the serial version.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you sure that the RANDOM_NUMBER function is thread-safe?
Most random number generators update some internal state after computing a new number. If this is protected by a lock, then the threads will have to process this function one at a time, and the overhead of handling the lock may be larger than the savings in any other parallel work.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
RANDOM_NUMBER is thread-safe, however, in order to be thread-safe, it uses a critical section (serializing section). In cases like this what you do is call RANDOM_NUMBER outside the parallel region with an argument that is an array (not scalar). The size of the array would typically be the iteration count of the parallel loop that follows. Then within the parallel loop, to get the random number, you index the array with the loop index. Using the array (harvest) format of RANDOM_NUMBER your program crosses the critical region once as opposed to on each iteration.
Note, in your case you would include:
double precision harvest(ndr*2) ... call RANDOM_NUMBER(harvest) ... !$omp parallel ... !$omp do ... call montec(m,outp,harvest) ! add harvested array of random numbers ... subroutine montec(ndr,sol,harvest) ... double precision harvest(ndr*2) ... do i = 1,ndr xr1 = harvest((i-1)*2+1) xr2 = harvest((i-1)*2+2)
Jim Dempsey
...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page