Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

Problem with OpenMP in Fortran


Dear all,

I recently encountered a problem with parallel do loop with OpenMP in Fortran.  The following code can reproduce what I saw.  I would appreciate any suggestions.  I use Visual Studio 2017 + Intel Parallel Studio XE 2018 Update 1.

program Rang_Test
    !$ use omp_lib
    implicit none
    integer, parameter :: ni = 10
    real :: rnd(8), fnl(ni)
    integer :: i, j
    real, allocatable :: rtmp

    rtmp = 1.
    !$omp parallel default(none) private(i, rnd) firstprivate(rtmp) shared(fnl)
    !$omp master
    write(*, *) 'Initial values'
    !$omp end master
    write(*, '(*(g0))') 'Core=', omp_get_thread_num(), '    rTmp=', rtmp
    !$omp barrier
    !$omp master
    write(*, *) 'Inside the loop'
    !$omp end master
    !$omp do 
    do i = 1, 10
        call RANDOM_NUMBER(rnd)
        rtmp = rnd(1)
        fnl(i) = rtmp
        !$omp critical
        write(*, '(*(g0))') 'i=',i,'    rnd(1)=', rnd(1), '    rTmp=', rtmp
        !$omp end critical
    end do
    !$omp end do
    !$omp end parallel
    write(*, *) 'After omp region'
    do i = 1, 10
        write(*, '(*(g0))') 'i=', i, '    fnl(i)=', fnl(i)
    end do
end program Rang_Test

In the example above, in each iteration, some calculations are done first (call random_number(rnd)), then the results are assigned to a firstprivate variable (rtmp), and some more calculations are done in the same iteration (fnl(i) = rTmp).  In this sample code, inside the loop, the values of rTmp should always equal to rnd(1).  However, I found that the values written to the screen were not what I expected.  The following are the execution results on my computer:

 Initial values
Core=2    rTmp=1.000000
Core=0    rTmp=1.000000
Core=3    rTmp=1.000000
Core=5    rTmp=1.000000
Core=4    rTmp=1.000000
Core=7    rTmp=1.000000
Core=6    rTmp=1.000000
Core=1    rTmp=1.000000
 Inside the loop
i=6    rnd(1)=.3001758    rTmp=.7522959
i=8    rnd(1)=.7958636    rTmp=.7522959
i=5    rnd(1)=.1966502E-01    rTmp=.7522959
i=3    rnd(1)=.3920868E-06    rTmp=.7522959
i=9    rnd(1)=.8392264    rTmp=.1941571
i=7    rnd(1)=.4387013    rTmp=.1941571
i=1    rnd(1)=.7522959    rTmp=.1941571
i=10    rnd(1)=.7564077E-01    rTmp=.2656559
i=4    rnd(1)=.1941571    rTmp=.2656559
i=2    rnd(1)=.2656559    rTmp=.2656559
 After omp region
i=1    fnl(i)=.7522959
i=2    fnl(i)=.2656559
i=3    fnl(i)=.3920868E-06
i=4    fnl(i)=.1941571
i=5    fnl(i)=.1966502E-01
i=6    fnl(i)=.3001758
i=7    fnl(i)=.4387013
i=8    fnl(i)=.7958636
i=9    fnl(i)=.8392264
i=10    fnl(i)=.7564077E-01

As you can see, the final values (fnl) are fine, but the values of rTmp are problematic.  I have tried a few other things and found:

  1. If rTmp is a regular nonallocatable variable, then the program works fine.
  2. Or, if rTmp is allocated to be an array, and only the first value is used (that is, use rTmp(1) = rnd(1)), then it works fine.
  3. I tried to compile the same code with gfortran, and it worked fine.

I would appreciate if anyone could give any suggestions.  Thanks.

0 Kudos
2 Replies
Black Belt

The behavior is like (assumption)

each thread is getting a first private copy of the pointer to the allocatable scalar (not a private copy of the scalar itself).

To correct for this:

!**    allocate(rtmp)
!**    rtmp = 1.
    !$omp parallel default(none) private(i, rnd) firstprivate(rtmp) shared(fnl)
    allocate(rtmp) !** allocate inside parallel region (to private pointer to scalar)
    rtmp = 1.      !** assign here to private pointer to scalar
    !$omp master

Jim Dempsey


Thank Jim, I will try this in my actual code.