- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi! I wrote a coin flipping program in Fortran with OpenMP, but it's not working properly. I'm using the Parallel Studio XE 2018 compiler with Visual Studio as the IDE. The number of flips is 10E9 which takes a single thread ~200 seconds to run on my i7-7700 and two threads takes ~800 seconds to run. I do know that the overhead from running two threads could be large enough that that is the reason for it taking longer, but doesn't that usually happen if your program doesn't have enough computational intensity? Curiously, gfortran -O3 -fopenmp with two threads takes ~530 seconds. This is my first real Fortran program, but I'm very familiar with programming. Also, I wrote a parallel coin flipping function in Julia with number of flips at 10E9 that takes ~2 seconds with one thread, ~1.15 seconds with two threads, and ~0.84 seconds with four threads. I'm wondering why my Julia function is faster than my Fortran program. Is there anything I can do to optimize my code? Thanks!
Below is the code:
! coin_flip_omp.f90 ! ! FUNCTIONS: ! coin_flip - Entry point of console application. ! !**************************************************************************** ! ! PROGRAM: coin_flip ! ! PURPOSE: A simple parallel coin flipping program ! !**************************************************************************** program coin_flip_omp ! Import libraries use omp_lib implicit none ! Variables integer, parameter :: MyLongIntType = selected_int_kind (12) integer (kind=MyLongIntType) :: num_of_flips = 10E9 integer (kind=MyLongIntType) :: count = 0, i = 1, j integer :: proc_num, thread_num real :: x real :: seconds ! Begin timing program seconds = omp_get_wtime() ! Program statement print *, 'This program will flip a coin', num_of_flips, 'times and report on the number of heads' ! How many processors are available? proc_num = omp_get_num_procs() thread_num = 2 call omp_set_num_threads(thread_num) print *, 'Number of processors is ', proc_num print *, 'Number of threads requested is ', thread_num ! Start while loop !$OMP PARALLEL DO DO j = 1, num_of_flips ! Flip the coin ! RANDOM_NUMBER returns a pseudo-random number between 0 and 1 call RANDOM_NUMBER(x) IF (x < 0.5) THEN count = count + 1 END IF i = i + 1 ! Increment counter by 1 END DO !$OMP END PARALLEL DO ! End timing seconds = omp_get_wtime() - seconds ! Print the number of heads print *, 'The number of heads is ', count print *, 'Time taken to run:', seconds, 'seconds' end program coin_flip_omp
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One issue I see is that calling RANDOM_NUMBER inside a parallel region will introduce lock contention as there is a single seed per program. One way around this is to generate num_of_flips random numbers before the parallel loop, and then just reference the jth value. You can allocate an array of the correct size and call RANDOM_NUMBER once to fill it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks! My code is running much faster now. Another thing that probably wasn't helping was that my project was tagged as "Debug" and not "Release" and so optimizations weren't being applied. Now the code takes about 4 seconds with one thread and num_of_flips set to 10E8.
Here's the revised code:
! coin_flip_omp.f90 ! ! FUNCTIONS: ! coin_flip - Entry point of console application. ! !**************************************************************************** ! ! PROGRAM: coin_flip ! ! PURPOSE: A simple parallel coin flipping program ! !**************************************************************************** program coin_flip_omp ! Import libraries use omp_lib implicit none ! Variables integer, parameter :: MyLongIntType = selected_int_kind (10) integer (kind=MyLongIntType) :: num_of_flips = 10E8 integer (kind=MyLongIntType) :: count = 0, j, i integer :: proc_num, thread_num real, allocatable :: rand_num_array(:) real :: seconds ! Begin timing program seconds = omp_get_wtime() ! Allocate rand_num_array allocate(rand_num_array(num_of_flips)) ! Program statement print *, 'This program will flip a coin', num_of_flips, 'times and report on the number of heads' ! Generate an array num_of_flips long of random numbers call RANDOM_NUMBER(rand_num_array) print *, 'Time to generate random array: ', omp_get_wtime() - seconds, 'seconds' ! How many processors are available? proc_num = omp_get_num_procs() thread_num = 4 ! Set number of threads to use call omp_set_num_threads(thread_num) print *, 'Number of processors is ', proc_num print *, 'Number of threads requested is ', thread_num ! Start while loop !$OMP PARALLEL DO REDUCTION(+:count) DO j = 1, num_of_flips ! if the jth value is less than 0.5, then call it heads IF (rand_num_array(j) < 0.5) THEN count = count + 1 END IF END DO !$OMP END PARALLEL DO ! End timing seconds = omp_get_wtime() - seconds ! Print the number of heads print *, 'The number of heads is ', count print *, 'The percentage of heads is ', dble(count)/dble(num_of_flips)*100 print *, 'Time taken to run:', seconds, 'seconds' end program coin_flip_omp
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One way of coming around the lock preventing RANDOM_NUMBER to run in parallell in OpenMP is to use coarrays instead. The code below seeds the differnt images with separate seeds by shifting around the seed of image no 1
subroutine co_RANDOM_SEED() implicit none integer::i integer,save:: seed(2)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page