Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Problem with OpenMP in Fortran

G__H_
Beginner
306 Views

Dear all,

I recently encountered a problem with parallel do loop with OpenMP in Fortran.  The following code can reproduce what I saw.  I would appreciate any suggestions.  I use Visual Studio 2017 + Intel Parallel Studio XE 2018 Update 1.

program Rang_Test
    !$ use omp_lib
    implicit none
    integer, parameter :: ni = 10
    real :: rnd(8), fnl(ni)
    integer :: i, j
    real, allocatable :: rtmp
    

    allocate(rtmp)
    rtmp = 1.
    
    !$omp parallel default(none) private(i, rnd) firstprivate(rtmp) shared(fnl)
    !$omp master
    write(*, *) 'Initial values'
    !$omp end master
    write(*, '(*(g0))') 'Core=', omp_get_thread_num(), '    rTmp=', rtmp
    !$omp barrier
    !$omp master
    write(*, *) 'Inside the loop'
    !$omp end master
    !$omp do 
    do i = 1, 10
        call RANDOM_NUMBER(rnd)
        rtmp = rnd(1)
        fnl(i) = rtmp
        !$omp critical
        write(*, '(*(g0))') 'i=',i,'    rnd(1)=', rnd(1), '    rTmp=', rtmp
        !$omp end critical
    end do
    !$omp end do
    !$omp end parallel
    write(*, *) 'After omp region'
    do i = 1, 10
        write(*, '(*(g0))') 'i=', i, '    fnl(i)=', fnl(i)
    end do
    
end program Rang_Test

In the example above, in each iteration, some calculations are done first (call random_number(rnd)), then the results are assigned to a firstprivate variable (rtmp), and some more calculations are done in the same iteration (fnl(i) = rTmp).  In this sample code, inside the loop, the values of rTmp should always equal to rnd(1).  However, I found that the values written to the screen were not what I expected.  The following are the execution results on my computer:

 Initial values
Core=2    rTmp=1.000000
Core=0    rTmp=1.000000
Core=3    rTmp=1.000000
Core=5    rTmp=1.000000
Core=4    rTmp=1.000000
Core=7    rTmp=1.000000
Core=6    rTmp=1.000000
Core=1    rTmp=1.000000
 Inside the loop
i=6    rnd(1)=.3001758    rTmp=.7522959
i=8    rnd(1)=.7958636    rTmp=.7522959
i=5    rnd(1)=.1966502E-01    rTmp=.7522959
i=3    rnd(1)=.3920868E-06    rTmp=.7522959
i=9    rnd(1)=.8392264    rTmp=.1941571
i=7    rnd(1)=.4387013    rTmp=.1941571
i=1    rnd(1)=.7522959    rTmp=.1941571
i=10    rnd(1)=.7564077E-01    rTmp=.2656559
i=4    rnd(1)=.1941571    rTmp=.2656559
i=2    rnd(1)=.2656559    rTmp=.2656559
 After omp region
i=1    fnl(i)=.7522959
i=2    fnl(i)=.2656559
i=3    fnl(i)=.3920868E-06
i=4    fnl(i)=.1941571
i=5    fnl(i)=.1966502E-01
i=6    fnl(i)=.3001758
i=7    fnl(i)=.4387013
i=8    fnl(i)=.7958636
i=9    fnl(i)=.8392264
i=10    fnl(i)=.7564077E-01

As you can see, the final values (fnl) are fine, but the values of rTmp are problematic.  I have tried a few other things and found:

  1. If rTmp is a regular nonallocatable variable, then the program works fine.
  2. Or, if rTmp is allocated to be an array, and only the first value is used (that is, use rTmp(1) = rnd(1)), then it works fine.
  3. I tried to compile the same code with gfortran, and it worked fine.

I would appreciate if anyone could give any suggestions.  Thanks.

0 Kudos
2 Replies
jimdempseyatthecove
Honored Contributor III
306 Views

The behavior is like (assumption)

each thread is getting a first private copy of the pointer to the allocatable scalar (not a private copy of the scalar itself).

To correct for this:

!**    allocate(rtmp)
!**    rtmp = 1.
    
    !$omp parallel default(none) private(i, rnd) firstprivate(rtmp) shared(fnl)
    allocate(rtmp) !** allocate inside parallel region (to private pointer to scalar)
    rtmp = 1.      !** assign here to private pointer to scalar
    !$omp master

Jim Dempsey

0 Kudos
G__H_
Beginner
306 Views

Thank Jim, I will try this in my actual code.

0 Kudos
Reply