- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi! I wrote a coin flipping program in Fortran with OpenMP, but it's not working properly. I'm using the Parallel Studio XE 2018 compiler with Visual Studio as the IDE. The number of flips is 10E9 which takes a single thread ~200 seconds to run on my i7-7700 and two threads takes ~800 seconds to run. I do know that the overhead from running two threads could be large enough that that is the reason for it taking longer, but doesn't that usually happen if your program doesn't have enough computational intensity? Curiously, gfortran -O3 -fopenmp with two threads takes ~530 seconds. This is my first real Fortran program, but I'm very familiar with programming. Also, I wrote a parallel coin flipping function in Julia with number of flips at 10E9 that takes ~2 seconds with one thread, ~1.15 seconds with two threads, and ~0.84 seconds with four threads. I'm wondering why my Julia function is faster than my Fortran program. Is there anything I can do to optimize my code? Thanks!
Below is the code:
! coin_flip_omp.f90
!
! FUNCTIONS:
! coin_flip - Entry point of console application.
!
!****************************************************************************
!
! PROGRAM: coin_flip
!
! PURPOSE: A simple parallel coin flipping program
!
!****************************************************************************
program coin_flip_omp
! Import libraries
use omp_lib
implicit none
! Variables
integer, parameter :: MyLongIntType = selected_int_kind (12)
integer (kind=MyLongIntType) :: num_of_flips = 10E9
integer (kind=MyLongIntType) :: count = 0, i = 1, j
integer :: proc_num, thread_num
real :: x
real :: seconds
! Begin timing program
seconds = omp_get_wtime()
! Program statement
print *, 'This program will flip a coin', num_of_flips, 'times and report on the number of heads'
! How many processors are available?
proc_num = omp_get_num_procs()
thread_num = 2
call omp_set_num_threads(thread_num)
print *, 'Number of processors is ', proc_num
print *, 'Number of threads requested is ', thread_num
! Start while loop
!$OMP PARALLEL DO
DO j = 1, num_of_flips
! Flip the coin
! RANDOM_NUMBER returns a pseudo-random number between 0 and 1
call RANDOM_NUMBER(x)
IF (x < 0.5) THEN
count = count + 1
END IF
i = i + 1 ! Increment counter by 1
END DO
!$OMP END PARALLEL DO
! End timing
seconds = omp_get_wtime() - seconds
! Print the number of heads
print *, 'The number of heads is ', count
print *, 'Time taken to run:', seconds, 'seconds'
end program coin_flip_omp
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One issue I see is that calling RANDOM_NUMBER inside a parallel region will introduce lock contention as there is a single seed per program. One way around this is to generate num_of_flips random numbers before the parallel loop, and then just reference the jth value. You can allocate an array of the correct size and call RANDOM_NUMBER once to fill it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks! My code is running much faster now. Another thing that probably wasn't helping was that my project was tagged as "Debug" and not "Release" and so optimizations weren't being applied. Now the code takes about 4 seconds with one thread and num_of_flips set to 10E8.
Here's the revised code:
! coin_flip_omp.f90
!
! FUNCTIONS:
! coin_flip - Entry point of console application.
!
!****************************************************************************
!
! PROGRAM: coin_flip
!
! PURPOSE: A simple parallel coin flipping program
!
!****************************************************************************
program coin_flip_omp
! Import libraries
use omp_lib
implicit none
! Variables
integer, parameter :: MyLongIntType = selected_int_kind (10)
integer (kind=MyLongIntType) :: num_of_flips = 10E8
integer (kind=MyLongIntType) :: count = 0, j, i
integer :: proc_num, thread_num
real, allocatable :: rand_num_array(:)
real :: seconds
! Begin timing program
seconds = omp_get_wtime()
! Allocate rand_num_array
allocate(rand_num_array(num_of_flips))
! Program statement
print *, 'This program will flip a coin', num_of_flips, 'times and report on the number of heads'
! Generate an array num_of_flips long of random numbers
call RANDOM_NUMBER(rand_num_array)
print *, 'Time to generate random array: ', omp_get_wtime() - seconds, 'seconds'
! How many processors are available?
proc_num = omp_get_num_procs()
thread_num = 4
! Set number of threads to use
call omp_set_num_threads(thread_num)
print *, 'Number of processors is ', proc_num
print *, 'Number of threads requested is ', thread_num
! Start while loop
!$OMP PARALLEL DO REDUCTION(+:count)
DO j = 1, num_of_flips
! if the jth value is less than 0.5, then call it heads
IF (rand_num_array(j) < 0.5) THEN
count = count + 1
END IF
END DO
!$OMP END PARALLEL DO
! End timing
seconds = omp_get_wtime() - seconds
! Print the number of heads
print *, 'The number of heads is ', count
print *, 'The percentage of heads is ', dble(count)/dble(num_of_flips)*100
print *, 'Time taken to run:', seconds, 'seconds'
end program coin_flip_omp
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One way of coming around the lock preventing RANDOM_NUMBER to run in parallell in OpenMP is to use coarrays instead. The code below seeds the differnt images with separate seeds by shifting around the seed of image no 1
subroutine co_RANDOM_SEED() implicit none integer::i integer,save:: seed(2)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page