Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28632 Discussions

openmp Reduction differences with reduction flag

Mike3
Beginner
727 Views

Hello,

     I have a code where I have parallelized one of the intensive subroutines using openmp.  I have a reduction variable which, which I I use the optimization flags -O2 or -O3 (but now -O1), the value of the reduction variable that I am outputing is about 1000x to large.  This does not happen when I run the code in serial with or without the -O3 flag, nor does it happen when I compile the code using gfortran and the -O3 flag.  Is there something about the optimization flags that are incompatbile with OMP REDUCTION?

Here is the portion of the code related to the reduction variable:

[fortran]

!$OMP PARALLEL DEFAULT(Shared) PRIVATE(Num,ID,jsr,Thread,CX,CY,CZ,i,j,k,
!$OMP& II,JJ,KK,XX,YY,ZZ,RSQ,Btest,BX,BY,BZ,RR,BondR,FFx,FFy,FFz,TH)

Thread = OMP_GET_THREAD_NUM()
TH = Thread + 1

!$OMP Do SCHEDULE(Static) REDUCTION( + : Nbond)
Do Num = 1,N
jsr = seed(Thread)

XX = DABS(RX(Num) - RX(ID))
YY = DABS(RY(Num) - RY(ID))
ZZ = DABS(RZ(Num) - RZ(ID))
RSQ = XX * XX + YY * YY + ZZ * ZZ

IF (RSQ < RcutB2 .AND. BindOn) 
if (Bond(Num,ID) == 0) Then
Btest = UNIF(jsr)
if (Btest < Bon) Then
Bond(Num,ID) = 1
Nbond = Nbond + 1
End if
End if
End if

If (Bond(Num,ID) == 1) Then
Btest = UNIF(jsr)
if (Btest < Boff) Then
Bond(Num,ID) = 0
Nbond = Nbond - 1

End Do
!$OMP END Do

!$OMP END PARALLEL

[/fortran]

This is a code snippet with some of the other computations taken out.  All the variables and whatnot are defines properly, as the code runs fine in serial and in parallel without O2 and O3 optimizations.

0 Kudos
11 Replies
jimdempseyatthecove
Honored Contributor III
727 Views

ID is declared as private, but is not initilaized before used (defined).

Jim Dempsey

0 Kudos
Mike3
Beginner
727 Views

All the variables have been defined as evidenced by the program working with the -openmp and -O1 flags but now with the -openmp -O2 or -openmp -O3.  I think that's my part for only posting what I considered the relavent parts of the subroutine.  Here is the complete subroutine in which ID is defined.

[fortran]

SUBROUTINE FORCE (N, Gridon, Cell, seed, jsr, kn, fn, wn,
& Thread_Num)
Implicit None

Include 'omp_lib.h'
Include 'maxarray.inc'
Include 'system.inc'

C *******************************************************************

INTEGER I, J, K, NUM, II, JJ, KK
INTEGER ID, CX,CY,CZ, jsr, Thread, Thread_Num, TH
INTEGER kn(128)
INTEGER seed(0:Thread_Max-1)
LOGICAL Gridon, Can_Bind
REAL*8 RXI, RYI, RZI, FXIJ, FYIJ, FZIJ, FIJ
REAL*8 FXI, FYI, FZI, RIJ, FFx, FFy, FFz
REAL*8 XX, YY, ZZ, BX, BY, BZ
REAL*8 RSQ, RR, BondR, Btest
REAL*8 FX_Priv(N,Thread_Num)
REAL*8 FY_Priv(N,Thread_Num)
REAL*8 FZ_Priv(N,Thread_Num)
REAL fn(128), wn(128)
REAL*8 Attract, A, B, C, wtime

C *******************************************************************

If (Gridon) Then

FX(1:N) = 0.0
FY(1:N) = 0.0
FZ(1:N) = 0.0
FX_Priv(1:N,1:Thread_Num) = 0.0
FY_Priv(1:N,1:Thread_Num) = 0.0
FZ_Priv(1:N,1:Thread_Num) = 0.0


!$OMP PARALLEL DEFAULT(Shared) PRIVATE(Num,ID,jsr,Thread,CX,CY,CZ,i,j,k,
!$OMP& II,JJ,KK,XX,YY,ZZ,RSQ,Btest,BX,BY,BZ,RR,BondR,FFx,FFy,FFz,TH)

Thread = OMP_GET_THREAD_NUM()
! Thread = 0
TH = Thread + 1


!$OMP Do SCHEDULE(Static) REDUCTION( + : Nbond)
Do Num = 1,N
jsr = seed(Thread)
CX = INT(RX(Num)/RXcut)+1
CY = INT(RY(Num)/RYcut)+1
CZ = INT(RZ(Num)/RZcut)+1
Do I=CX-1,CX+1
Do J=CY-1,CY+1
Do K=CZ-1,CZ+1
II = i
JJ = j
KK = k
C FOR X
If (II.LT.1) II=II+NCellx
If (II.GT.NCellx) II=II-NCellx
C FOR Y
If (JJ.LT.1) JJ=JJ+NCelly
If (JJ.GT.NCelly) JJ=JJ-NCelly
C FOR Z
If (KK.LT.1) KK=KK+NCellz
If (KK.GT.NCellz) KK=KK-NCellz

ID = Cell(II,JJ,KK)
Do While (ID.NE.0)
If (ID <= Num) Then
ID = Linker(ID) !Don't count same/twice
Cycle
End If
XX = DABS(RX(Num) - RX(ID))
YY = DABS(RY(Num) - RY(ID))
ZZ = DABS(RZ(Num) - RZ(ID))
if (XX.GT.HLx) XX = Lx - XX !P
if (YY.GT.HLy) YY = Ly - YY !B
if (ZZ.GT.HLz) ZZ = Lz - ZZ !C
RSQ = XX * XX + YY * YY + ZZ * ZZ

IF (RSQ < RcutB2 .AND. BindOn) Then
if (Bond(Num,ID) == 0) Then
Btest = UNIF(jsr)
if (Btest < Bon) Then
Bond(Num,ID) = 1
Nbond = Nbond + 1

End if
End if
End if

If (Bond(Num,ID) == 1) Then
Btest = UNIF(jsr)
if (Btest < Boff) Then
Bond(Num,ID) = 0
Nbond = Nbond - 1

End if
End if

If ((Bond(Num,ID) == 1) .OR. (RSQ < BondEQ)) Then
BX = RX(Num) - RX(ID)
BY = RY(Num) - RY(ID)
BZ = RZ(Num) - RZ(ID)

if (BX.GE.HLx) Then
BX = BX - Lx
Elseif (BX.LT.-HLx) Then
BX = BX + Lx
End If
if (BY.GE.HLy) Then
BY = BY - Ly
Elseif (BY.LT.-HLy) Then
BY = BY + Ly
End If
if (BZ.GE.HLz) Then
BZ = BZ - Lz
Elseif (BZ.LT.-HLz) Then
BZ = BZ + Lz
End If

RR = DSQRT(RSQ)
BondR = RR - BondEQ
FFx = -Spring*BondR*(BX/RR)
FFy = -Spring*BondR*(BY/RR)
if (Zon) FFz = -Spring*BondR*(BZ/RR)

FX_Priv(Num,TH) = FX_Priv(Num,TH) + FFx
FY_Priv(Num,TH) = FY_Priv(Num,TH) + FFy
FX_Priv(ID,TH) = FX_Priv(ID,TH) - FFx
FY_Priv(ID,TH) = FY_Priv(ID,TH) - FFy
If (Zon) Then
FZ_Priv(Num,TH) = FZ_Priv(Num,TH)+FFz
FZ_Priv(ID,TH) = FZ_Priv(ID,TH) - FFz
End If

End If
ID = Linker(ID)
seed(Thread) = jsr
End Do
End Do
End Do
End Do

FX_Priv(Num,TH) = FX_Priv(Num,TH)+Scale*GAUSS2(jsr,kn,fn,wn)
FY_Priv(Num,TH) = FY_Priv(Num,TH)+Scale*GAUSS2(jsr,kn,fn,wn)
IF(Zon) Then
FZ_Priv(Num,TH)=FZ_Priv(Num,TH)+Scale*GAUSS2(jsr,kn,fn,wn)
End If

End Do
!$OMP END Do

Do j=1,Thread_Num
!$OMP DO
Do i=1,N
FX(i) = FX(i) + FX_Priv(i,j)
FY(i) = FY(i) + FY_Priv(i,j)
if (Zon) FZ(i) = FZ(i) + FZ_Priv(i,j)
End Do
!$OMP END DO
End Do
!$OMP END PARALLEL

End If

RETURN
END

[/fortran]

Here, UNIF() and GAUSS2() are calls to a random number generators, and some of variables are defined in common statements that are included in the 'include system.inc' statement.

0 Kudos
Steven_L_Intel1
Employee
727 Views

Mike wrote:

All the variables have been defined as evidenced by the program working with the -openmp and -O1 flags

Sorry, but that's not evidence of anything.

0 Kudos
jimdempseyatthecove
Honored Contributor III
727 Views

I do not see Nbond declared. Is it in one of your .inc files?

In any event, the REDUCTION( + : Nbond), I believe would belong on the !$OMP PARALLEL, not on the enclosed !$OMP DO

Jim Dempsey

0 Kudos
Mike3
Beginner
727 Views

Hello,

Yes, Nbond is defined in my system.inc file.  I tired moving the REDUCTION(+:Nbond) into the !$OMP PARALLEL section, but it still produced the same results.  Here is a sample output of my program showing the difference.

Run in serial with -O3:

Time (us)  Nbonds     N

100.0000  13328        5041
200.0000  13474        5041
300.0000  13568        5041
400.0000  13667        5041
500.0000  13707        5041
600.0000  13751        5041
700.0000  13800        5041
800.0000  13844        5041
900.0000  13852        5041
1000.0000 13878       5041

Run using 8 threads and -O1 -openmp

Time (us)  Nbonds    N

100.0000  13348       5041
200.0000  13532       5041
300.0000  13589       5041
400.0000  13701       5041
500.0000  13775       5041
600.0000  13784       5041
700.0000  13814       5041
800.0000  13883       5041
900.0000  13869       5041
1000.0000 13906      5041

Run with 8 Threads and -O3 -openmp

Time (us)  Nbonds     N

100.0000  7499843     5041
200.0000  9781784     5041
300.0000  10867080   5041
400.0000  11468020   5041
500.0000  11824365   5041
600.0000  12042995   5041
700.0000  12178306   5041
800.0000  12265422   5041
900.0000  12319681   5041
1000.0000 12354566  5041

The middle number of the output corresponds to the Nbond variable that I use in the reduction clause.  It is a stochastic variable, so the numbers between the runs should not be the exact same, but they should be close.  When I use the -O3 compiler flag, the output Nbond variable icreases by about 1000?

0 Kudos
jimdempseyatthecove
Honored Contributor III
727 Views

Save a copy of your code to reference later. Make these edits and give it a try:

[fortran]

integer :: NbondLocal
...
!$OMP PARALLEL DEFAULT(Shared) PRIVATE(Num,ID,jsr,Thread,CX,CY,CZ,i,j,k, 
!$OMP& II,JJ,KK,XX,YY,ZZ,RSQ,Btest,BX,BY,BZ,RR,BondR,FFx,FFy,FFz,TH, NbondLocal) ! the local copy
!$OMP& REDUCTION( + : Nbond) ! the .inc copy
nBondLocal = 0
...
!$OMP DO SCHEDUAL(Static) ! *** no reduction here
...
NbondLocal = NbondLocal + 1 ! **** replace all nBond with NbondLocal inside the !$OMP DO
!$OMP END DO
Nbond = Nbond + NbondLocal ! accumulate into the reduction variable here
...
!$OMP END PARALLEL ! performs the reduction from inner Nbond to outer Nbond here
 [/fortran]

Jim Dempsey

0 Kudos
Mike3
Beginner
727 Views

Thank you for the response.  I tried switching to using Nbondlocal within the inner loop.  I am still experiencing the same problems, but this does seem to be a better way of doing things.  I think I have tracked things down to the calls to the random number generator (Btest = UNIF(jsr) in the code).  It seems that when I use the -O1 compiler flag the value of jsr is being changed (as it is changed within the UNIF function) but when I use the -O3 compiler flag, the value of jsr is the same throughout.  Therefore, I am generally not getting into the section of the code where Nbondlocal (or Nbond) is decreased.  Is there a difference in how ifort handles function calls when the -O1 optimization flag is used versus the -O3 optimization flag?

The random number generator function looks like this:

[fortran]

Function UNIF(jsr)
Implicit None

C ** Fuction to Generate Uniformly Distributed *********************
C ** Random Numbers on [0,1] ****************************************

INTEGER jsr, jsr_input
REAL UNIF

jsr_input = jsr

jsr = ieor (jsr, ishft(jsr, 13))
jsr = ieor (jsr, ishft(jsr, -17))
jsr = ieor (jsr, ishft(jsr, 5))

UNIF = 0.5E+00 + 0.2328306E-9 * REAL( jsr_input + jsr )

Return
End

[/fortran]

When I output the value of jsr within the function, it does change from the beginning to the end of the function.  However, when I output the value of jsr before and after the call to the function, the result is jsr is different with the -O1 flag set and jsr is the same when the -O3 flag is set.  I tried declaring jsr within the function with intent(inout), but that did not solve the problem.  I am not sure if this is the exact cause for the value of Nbond to be so greatly increased, but it couldn't help.  Also, this problem (neither the Nbond or the random number generator problem) occur when compiling with gfortran and the -O3 flag.

Thanks,

Mike

0 Kudos
jimdempseyatthecove
Honored Contributor III
727 Views

You should have disclosed that UNIF were a random number generator.

Try something like this:

[fortran]
!$OMP Do SCHEDULE(Static) REDUCTION( + : Nbond)
Do Num = 1,N
  jsr = seed(0)
  DO I=1,N
    Best = UNIF(jsr)
  END DO
[/fortran]

This way, each iteration of the Do Num loop has a unique random seed, that is the same, regardless of which thead or how many threads (including 1 thread)

Jim Dempsey



0 Kudos
jimdempseyatthecove
Honored Contributor III
727 Views

You want reproducability, use a reproducable method.

For performance, you could build (once) an array of N seed values, then re-init using a lookup: jsr=jsr_seed(Num)

Jim Dempsey

0 Kudos
Mike3
Beginner
727 Views

Thanks for your input, but I am a little confused.  In the code listed above, the array seed is previously initialized with random integers corresponding to the maximum number of threads and then passed to the subroutine.

[fortran]

Thread = OMP_GET_THREAD_NUM()

!$OMP Do SCHEDULE(Static) REDUCTION( + : Nbond)

Do Num = 1, N

jsr = seed(Thread)

...

if (Bond(Num,ID) == 0) Then

Btest = UNIF(jsr)

...

End If

...

End Do

[/fortran]

Is this not giving me a unique random seed for each thread?  What seems to be the difference is that with the -O1 flag, when I call UNIF(jsr) with a particular value of jsr, the value of jsr is changed during the call to UNIF(jsr).  However, when I use the -O3 flag, the value of jsr is the same before and after the call to UNIF(jsr) for a given thread (the values of jsr are different among the different threads, but the value of jst for the same thread does not change between successive calls to UNIF(jsr))

0 Kudos
jimdempseyatthecove
Honored Contributor III
727 Views

The code I provided gives a unique random seed for: Num=1, Num=2, Num=3, ...
and the same random seed for the given Num
no matter which order the Num's are encountered

With 2 threads, the !$OMP DO might have in order of time

Thread(0), Num=1
Thread(1), Num=N/2
Thread(0), Num=2
Thread(0), Num=3 (thread 0 beat thread 1 this time)
Thread(1), Num=N/2+1
...

Additionaly, depending on how the code flows, a differing number of UNIF's may be called by each thread.

The code I provided, gives random sequences, that are consistent with respect of Num, regardless of the order in which the Num is processed.

In -O3, the value of jsr may reside in a register, watching the memory location reserved for jsr will not show you what is in the register (register gets copied only when a save is required). Also, the debugger gets at times gets goofed up with respect to finding variables across a shared/private boundary.

Jim Dempsey

0 Kudos
Reply