- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a code where I have parallelized one of the intensive subroutines using openmp. I have a reduction variable which, which I I use the optimization flags -O2 or -O3 (but now -O1), the value of the reduction variable that I am outputing is about 1000x to large. This does not happen when I run the code in serial with or without the -O3 flag, nor does it happen when I compile the code using gfortran and the -O3 flag. Is there something about the optimization flags that are incompatbile with OMP REDUCTION?
Here is the portion of the code related to the reduction variable:
[fortran]
!$OMP PARALLEL DEFAULT(Shared) PRIVATE(Num,ID,jsr,Thread,CX,CY,CZ,i,j,k,
!$OMP& II,JJ,KK,XX,YY,ZZ,RSQ,Btest,BX,BY,BZ,RR,BondR,FFx,FFy,FFz,TH)
Thread = OMP_GET_THREAD_NUM()
TH = Thread + 1
!$OMP Do SCHEDULE(Static) REDUCTION( + : Nbond)
Do Num = 1,N
jsr = seed(Thread)
XX = DABS(RX(Num) - RX(ID))
YY = DABS(RY(Num) - RY(ID))
ZZ = DABS(RZ(Num) - RZ(ID))
RSQ = XX * XX + YY * YY + ZZ * ZZ
IF (RSQ < RcutB2 .AND. BindOn)
if (Bond(Num,ID) == 0) Then
Btest = UNIF(jsr)
if (Btest < Bon) Then
Bond(Num,ID) = 1
Nbond = Nbond + 1
End if
End if
End if
If (Bond(Num,ID) == 1) Then
Btest = UNIF(jsr)
if (Btest < Boff) Then
Bond(Num,ID) = 0
Nbond = Nbond - 1
End Do
!$OMP END Do
!$OMP END PARALLEL
[/fortran]
This is a code snippet with some of the other computations taken out. All the variables and whatnot are defines properly, as the code runs fine in serial and in parallel without O2 and O3 optimizations.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ID is declared as private, but is not initilaized before used (defined).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
All the variables have been defined as evidenced by the program working with the -openmp and -O1 flags but now with the -openmp -O2 or -openmp -O3. I think that's my part for only posting what I considered the relavent parts of the subroutine. Here is the complete subroutine in which ID is defined.
[fortran]
SUBROUTINE FORCE (N, Gridon, Cell, seed, jsr, kn, fn, wn,
& Thread_Num)
Implicit None
Include 'omp_lib.h'
Include 'maxarray.inc'
Include 'system.inc'
C *******************************************************************
INTEGER I, J, K, NUM, II, JJ, KK
INTEGER ID, CX,CY,CZ, jsr, Thread, Thread_Num, TH
INTEGER kn(128)
INTEGER seed(0:Thread_Max-1)
LOGICAL Gridon, Can_Bind
REAL*8 RXI, RYI, RZI, FXIJ, FYIJ, FZIJ, FIJ
REAL*8 FXI, FYI, FZI, RIJ, FFx, FFy, FFz
REAL*8 XX, YY, ZZ, BX, BY, BZ
REAL*8 RSQ, RR, BondR, Btest
REAL*8 FX_Priv(N,Thread_Num)
REAL*8 FY_Priv(N,Thread_Num)
REAL*8 FZ_Priv(N,Thread_Num)
REAL fn(128), wn(128)
REAL*8 Attract, A, B, C, wtime
C *******************************************************************
If (Gridon) Then
FX(1:N) = 0.0
FY(1:N) = 0.0
FZ(1:N) = 0.0
FX_Priv(1:N,1:Thread_Num) = 0.0
FY_Priv(1:N,1:Thread_Num) = 0.0
FZ_Priv(1:N,1:Thread_Num) = 0.0
!$OMP PARALLEL DEFAULT(Shared) PRIVATE(Num,ID,jsr,Thread,CX,CY,CZ,i,j,k,
!$OMP& II,JJ,KK,XX,YY,ZZ,RSQ,Btest,BX,BY,BZ,RR,BondR,FFx,FFy,FFz,TH)
Thread = OMP_GET_THREAD_NUM()
! Thread = 0
TH = Thread + 1
!$OMP Do SCHEDULE(Static) REDUCTION( + : Nbond)
Do Num = 1,N
jsr = seed(Thread)
CX = INT(RX(Num)/RXcut)+1
CY = INT(RY(Num)/RYcut)+1
CZ = INT(RZ(Num)/RZcut)+1
Do I=CX-1,CX+1
Do J=CY-1,CY+1
Do K=CZ-1,CZ+1
II = i
JJ = j
KK = k
C FOR X
If (II.LT.1) II=II+NCellx
If (II.GT.NCellx) II=II-NCellx
C FOR Y
If (JJ.LT.1) JJ=JJ+NCelly
If (JJ.GT.NCelly) JJ=JJ-NCelly
C FOR Z
If (KK.LT.1) KK=KK+NCellz
If (KK.GT.NCellz) KK=KK-NCellz
ID = Cell(II,JJ,KK)
Do While (ID.NE.0)
If (ID <= Num) Then
ID = Linker(ID) !Don't count same/twice
Cycle
End If
XX = DABS(RX(Num) - RX(ID))
YY = DABS(RY(Num) - RY(ID))
ZZ = DABS(RZ(Num) - RZ(ID))
if (XX.GT.HLx) XX = Lx - XX !P
if (YY.GT.HLy) YY = Ly - YY !B
if (ZZ.GT.HLz) ZZ = Lz - ZZ !C
RSQ = XX * XX + YY * YY + ZZ * ZZ
IF (RSQ < RcutB2 .AND. BindOn) Then
if (Bond(Num,ID) == 0) Then
Btest = UNIF(jsr)
if (Btest < Bon) Then
Bond(Num,ID) = 1
Nbond = Nbond + 1
End if
End if
End if
If (Bond(Num,ID) == 1) Then
Btest = UNIF(jsr)
if (Btest < Boff) Then
Bond(Num,ID) = 0
Nbond = Nbond - 1
End if
End if
If ((Bond(Num,ID) == 1) .OR. (RSQ < BondEQ)) Then
BX = RX(Num) - RX(ID)
BY = RY(Num) - RY(ID)
BZ = RZ(Num) - RZ(ID)
if (BX.GE.HLx) Then
BX = BX - Lx
Elseif (BX.LT.-HLx) Then
BX = BX + Lx
End If
if (BY.GE.HLy) Then
BY = BY - Ly
Elseif (BY.LT.-HLy) Then
BY = BY + Ly
End If
if (BZ.GE.HLz) Then
BZ = BZ - Lz
Elseif (BZ.LT.-HLz) Then
BZ = BZ + Lz
End If
RR = DSQRT(RSQ)
BondR = RR - BondEQ
FFx = -Spring*BondR*(BX/RR)
FFy = -Spring*BondR*(BY/RR)
if (Zon) FFz = -Spring*BondR*(BZ/RR)
FX_Priv(Num,TH) = FX_Priv(Num,TH) + FFx
FY_Priv(Num,TH) = FY_Priv(Num,TH) + FFy
FX_Priv(ID,TH) = FX_Priv(ID,TH) - FFx
FY_Priv(ID,TH) = FY_Priv(ID,TH) - FFy
If (Zon) Then
FZ_Priv(Num,TH) = FZ_Priv(Num,TH)+FFz
FZ_Priv(ID,TH) = FZ_Priv(ID,TH) - FFz
End If
End If
ID = Linker(ID)
seed(Thread) = jsr
End Do
End Do
End Do
End Do
FX_Priv(Num,TH) = FX_Priv(Num,TH)+Scale*GAUSS2(jsr,kn,fn,wn)
FY_Priv(Num,TH) = FY_Priv(Num,TH)+Scale*GAUSS2(jsr,kn,fn,wn)
IF(Zon) Then
FZ_Priv(Num,TH)=FZ_Priv(Num,TH)+Scale*GAUSS2(jsr,kn,fn,wn)
End If
End Do
!$OMP END Do
Do j=1,Thread_Num
!$OMP DO
Do i=1,N
FX(i) = FX(i) + FX_Priv(i,j)
FY(i) = FY(i) + FY_Priv(i,j)
if (Zon) FZ(i) = FZ(i) + FZ_Priv(i,j)
End Do
!$OMP END DO
End Do
!$OMP END PARALLEL
End If
RETURN
END
[/fortran]
Here, UNIF() and GAUSS2() are calls to a random number generators, and some of variables are defined in common statements that are included in the 'include system.inc' statement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mike wrote:
All the variables have been defined as evidenced by the program working with the -openmp and -O1 flags
Sorry, but that's not evidence of anything.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do not see Nbond declared. Is it in one of your .inc files?
In any event, the REDUCTION( + : Nbond), I believe would belong on the !$OMP PARALLEL, not on the enclosed !$OMP DO
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Yes, Nbond is defined in my system.inc file. I tired moving the REDUCTION(+:Nbond) into the !$OMP PARALLEL section, but it still produced the same results. Here is a sample output of my program showing the difference.
Run in serial with -O3:
Time (us) Nbonds N
100.0000 13328 5041
200.0000 13474 5041
300.0000 13568 5041
400.0000 13667 5041
500.0000 13707 5041
600.0000 13751 5041
700.0000 13800 5041
800.0000 13844 5041
900.0000 13852 5041
1000.0000 13878 5041
Run using 8 threads and -O1 -openmp
Time (us) Nbonds N
100.0000 13348 5041
200.0000 13532 5041
300.0000 13589 5041
400.0000 13701 5041
500.0000 13775 5041
600.0000 13784 5041
700.0000 13814 5041
800.0000 13883 5041
900.0000 13869 5041
1000.0000 13906 5041
Run with 8 Threads and -O3 -openmp
Time (us) Nbonds N
100.0000 7499843 5041
200.0000 9781784 5041
300.0000 10867080 5041
400.0000 11468020 5041
500.0000 11824365 5041
600.0000 12042995 5041
700.0000 12178306 5041
800.0000 12265422 5041
900.0000 12319681 5041
1000.0000 12354566 5041
The middle number of the output corresponds to the Nbond variable that I use in the reduction clause. It is a stochastic variable, so the numbers between the runs should not be the exact same, but they should be close. When I use the -O3 compiler flag, the output Nbond variable icreases by about 1000?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Save a copy of your code to reference later. Make these edits and give it a try:
[fortran]
integer :: NbondLocal
...
!$OMP PARALLEL DEFAULT(Shared) PRIVATE(Num,ID,jsr,Thread,CX,CY,CZ,i,j,k,
!$OMP& II,JJ,KK,XX,YY,ZZ,RSQ,Btest,BX,BY,BZ,RR,BondR,FFx,FFy,FFz,TH, NbondLocal) ! the local copy
!$OMP& REDUCTION( + : Nbond) ! the .inc copy
nBondLocal = 0
...
!$OMP DO SCHEDUAL(Static) ! *** no reduction here
...
NbondLocal = NbondLocal + 1 ! **** replace all nBond with NbondLocal inside the !$OMP DO
!$OMP END DO
Nbond = Nbond + NbondLocal ! accumulate into the reduction variable here
...
!$OMP END PARALLEL ! performs the reduction from inner Nbond to outer Nbond here
[/fortran]
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the response. I tried switching to using Nbondlocal within the inner loop. I am still experiencing the same problems, but this does seem to be a better way of doing things. I think I have tracked things down to the calls to the random number generator (Btest = UNIF(jsr) in the code). It seems that when I use the -O1 compiler flag the value of jsr is being changed (as it is changed within the UNIF function) but when I use the -O3 compiler flag, the value of jsr is the same throughout. Therefore, I am generally not getting into the section of the code where Nbondlocal (or Nbond) is decreased. Is there a difference in how ifort handles function calls when the -O1 optimization flag is used versus the -O3 optimization flag?
The random number generator function looks like this:
[fortran]
Function UNIF(jsr)
Implicit None
C ** Fuction to Generate Uniformly Distributed *********************
C ** Random Numbers on [0,1] ****************************************
INTEGER jsr, jsr_input
REAL UNIF
jsr_input = jsr
jsr = ieor (jsr, ishft(jsr, 13))
jsr = ieor (jsr, ishft(jsr, -17))
jsr = ieor (jsr, ishft(jsr, 5))
UNIF = 0.5E+00 + 0.2328306E-9 * REAL( jsr_input + jsr )
Return
End
[/fortran]
When I output the value of jsr within the function, it does change from the beginning to the end of the function. However, when I output the value of jsr before and after the call to the function, the result is jsr is different with the -O1 flag set and jsr is the same when the -O3 flag is set. I tried declaring jsr within the function with intent(inout), but that did not solve the problem. I am not sure if this is the exact cause for the value of Nbond to be so greatly increased, but it couldn't help. Also, this problem (neither the Nbond or the random number generator problem) occur when compiling with gfortran and the -O3 flag.
Thanks,
Mike
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should have disclosed that UNIF were a random number generator.
Try something like this:
[fortran]
!$OMP Do SCHEDULE(Static) REDUCTION( + : Nbond)
Do Num = 1,N
jsr = seed(0)
DO I=1,N
Best = UNIF(jsr)
END DO
[/fortran]
This way, each iteration of the Do Num loop has a unique random seed, that is the same, regardless of which thead or how many threads (including 1 thread)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You want reproducability, use a reproducable method.
For performance, you could build (once) an array of N seed values, then re-init using a lookup: jsr=jsr_seed(Num)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your input, but I am a little confused. In the code listed above, the array seed is previously initialized with random integers corresponding to the maximum number of threads and then passed to the subroutine.
[fortran]
Thread = OMP_GET_THREAD_NUM()
!$OMP Do SCHEDULE(Static) REDUCTION( + : Nbond)
Do Num = 1, N
jsr = seed(Thread)
...
if (Bond(Num,ID) == 0) Then
Btest = UNIF(jsr)
...
End If
...
End Do
[/fortran]
Is this not giving me a unique random seed for each thread? What seems to be the difference is that with the -O1 flag, when I call UNIF(jsr) with a particular value of jsr, the value of jsr is changed during the call to UNIF(jsr). However, when I use the -O3 flag, the value of jsr is the same before and after the call to UNIF(jsr) for a given thread (the values of jsr are different among the different threads, but the value of jst for the same thread does not change between successive calls to UNIF(jsr))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The code I provided gives a unique random seed for: Num=1, Num=2, Num=3, ...
and the same random seed for the given Num
no matter which order the Num's are encountered
With 2 threads, the !$OMP DO might have in order of time
Thread(0), Num=1
Thread(1), Num=N/2
Thread(0), Num=2
Thread(0), Num=3 (thread 0 beat thread 1 this time)
Thread(1), Num=N/2+1
...
Additionaly, depending on how the code flows, a differing number of UNIF's may be called by each thread.
The code I provided, gives random sequences, that are consistent with respect of Num, regardless of the order in which the Num is processed.
In -O3, the value of jsr may reside in a register, watching the memory location reserved for jsr will not show you what is in the register (register gets copied only when a save is required). Also, the debugger gets at times gets goofed up with respect to finding variables across a shared/private boundary.
Jim Dempsey
![](/skins/images/3CECF0550DB8BF54496C114A1FF06FE9/responsive_peak/images/icon_anonymous_message.png)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page