Solved: OpenMP Do Loop syncronization problem

f_y_ · ‎11-20-2016

I'm new to OMP. I'm trying to code a simple matrix problem in Fortran by OpenMP which is used for just a calling process. I've three subroutines, one for taking initial value and the others are matrix calculation. I think, I've a syncronization problem in matrix calculation because in some case (for ex. for some thread number) matrix value is not same with the desired last loop step. Generally its some values belong the previous step. Any suggetion?

program Matrix_E
Real :: Ematrix(10,10)
Real :: Gmatrix(10,10)
INTEGER  numel,numproc, i, j, OMP_GET_NUM_THREADS, OMP_GET_THREAD_NUM
Write (6,*) 'input number of elements:'
Read (5,*) numel
Write (6,*) 'input number of processers:'
Read (5,*) numproc     !Number of thread
Call OMP_SET_NUM_THREADS(numproc)   

!---------- parallel region--------------
!$OMP PARALLEL SHARED(numel,  Gmatrix) PRIVATE(i, EMatrix)
Call Zero (Gmatrix)  ! For initial, Members of Matrix are equal to zero for all threads.

!$OMP DO SCHEDULE(STATIC)
Do i = 1,numel  !Each thread calculates EMatrix for an element.
    Call Ematrix_Loop (Ematrix, Gmatrix)    
End Do 
!$OMP END PARALLEL
!----------End of parallel region--------
Write (6,*) GMatrix   
End program Matrix_E  !End of main program
!-------------------------  
    
Subroutine Ematrix_Loop (d_ematrix, d_Gmatrix)  !In this subroutine all members of EMatrix are equiled to 1 and sleep function is used to time wasting.
implicit none
Integer :: K,J
Real, INTENT(OUT):: d_EMatrix(10,10)
Real :: d_GMatrix(10,10)
    Do K = 1,10 ! Outer loop - right index     
      Do J=1,10 ! Inner loop - left index
        d_EMatrix(J,K) = 1
        Call sleepqq (1)
      End Do    
    End Do   
Call Gmatrix_Loop (d_EMatrix, d_GMatrix)        
End subroutine Ematrix_Loop
    
Subroutine Gmatrix_Loop (dd_EMatrix, dd_GMatrix)   !This subroutine calculates Gmatrix which is matrix summation of EMatrix.
implicit none
Integer :: ii, jj
Real, INTENT(IN):: dd_EMatrix(10,10)
Real, INTENT(OUT):: dd_GMatrix(10,10) 
        DO 120 ii=1, 10 ! Outer loop - right index 
        DO 130 jj=1, 10 ! Inner loop - left index
            dd_GMatrix(jj,ii)= dd_GMatrix(jj,ii) + dd_EMatrix(jj,ii)           
130     Continue    
120     Continue    
End subroutine Gmatrix_Loop   
        
Subroutine Zero (z_Gmatrix)  !This subroutine makes zero all members for initial value of Matrices. 
implicit none
Integer :: K,J
Real :: z_GMatrix(10,10)
Real :: z_EMatrix(10,10)
    Do K = 1,10 ! Outer loop - right index     
      Do J=1,10 ! Inner loop - left index
        z_GMatrix(J,K) = 0
        z_Ematrix(j,k) = 0
      End Do    
    End Do         
End subroutine Zero

John_Campbell · ‎11-21-2016

You need to indicate Gmatrix is being modified by all threads, so a private version must be accumulated. I also "initialised" Gmatrix outside the !$OMP region.

the following should work

  program Matrix_E
    Real :: Ematrix(10,10)
    Real :: Gmatrix(10,10)
    INTEGER  numel,numproc, i, j  
!$  INTEGER  OMP_GET_NUM_THREADS, OMP_GET_THREAD_NUM
!
    numel = 30
    numproc = 4
    Write (6,*) 'input number of elements:', numel
!    Read  (5,*) numel
    Write (6,*) 'input number of processers:', numproc
!    Read  (5,*) numproc     !Number of thread
!
!$    Call OMP_SET_NUM_THREADS(numproc)   
!
     gmatrix = 0
!---------- parallel region--------------
!$OMP PARALLEL DO SHARED(numel) PRIVATE(i, EMatrix) REDUCTION(+ : Gmatrix) SCHEDULE(STATIC)                         
!
!    Call Zero (Gmatrix)  ! For initial, Members of Matrix are equal to zero for all threads.
!
!     !!OMP DO SCHEDULE(STATIC)
    Do i = 1,numel  !Each thread calculates EMatrix for an element.
!$      write (*,*) i, OMP_GET_THREAD_NUM ()
        Call Ematrix_Loop (Ematrix, Gmatrix)    
    End Do 
!$OMP END PARALLEL DO
!----------End of parallel region--------
   do i = 1,10
    Write (6,*) i, GMatrix(:,i)
   end do   
  End program Matrix_E  !End of main program

View solution in original post

IanH · ‎11-20-2016

!---------- parallel region--------------
!$OMP PARALLEL SHARED(numel,  Gmatrix) PRIVATE(i, EMatrix)
Call Zero (Gmatrix)  ! For initial, Members of Matrix are equal to zero for all threads.

The data-sharing attribute for GMatrix is shared, but all threads in the team are modifying its entirety here (and in a later procedure call). This isn't permitted. Perhaps GMatrix should be PRIVATE?

In subroutine Zero, you define the value of a local variable called z_Ematrix to zero. This is pointless, because that local variable isn't used. Perhaps you meant that local variable to be a dummy argument?

Fortran 90's array operations would simplify your example code.

John_Campbell · ‎11-21-2016

You need to indicate Gmatrix is being modified by all threads, so a private version must be accumulated. I also "initialised" Gmatrix outside the !$OMP region.

the following should work

  program Matrix_E
    Real :: Ematrix(10,10)
    Real :: Gmatrix(10,10)
    INTEGER  numel,numproc, i, j  
!$  INTEGER  OMP_GET_NUM_THREADS, OMP_GET_THREAD_NUM
!
    numel = 30
    numproc = 4
    Write (6,*) 'input number of elements:', numel
!    Read  (5,*) numel
    Write (6,*) 'input number of processers:', numproc
!    Read  (5,*) numproc     !Number of thread
!
!$    Call OMP_SET_NUM_THREADS(numproc)   
!
     gmatrix = 0
!---------- parallel region--------------
!$OMP PARALLEL DO SHARED(numel) PRIVATE(i, EMatrix) REDUCTION(+ : Gmatrix) SCHEDULE(STATIC)                         
!
!    Call Zero (Gmatrix)  ! For initial, Members of Matrix are equal to zero for all threads.
!
!     !!OMP DO SCHEDULE(STATIC)
    Do i = 1,numel  !Each thread calculates EMatrix for an element.
!$      write (*,*) i, OMP_GET_THREAD_NUM ()
        Call Ematrix_Loop (Ematrix, Gmatrix)    
    End Do 
!$OMP END PARALLEL DO
!----------End of parallel region--------
   do i = 1,10
    Write (6,*) i, GMatrix(:,i)
   end do   
  End program Matrix_E  !End of main program

jimdempseyatthecove · ‎11-21-2016

f y,

A little expansion on IanH's comments.

In your original post, the primary reason the Call Zero(GMatrix) was misplaced is .NOT. necessarily that you were performing redundant work (each thread zeroing the array), rather that each thread may perform that call at (slightly) different times, and that it is possible for one or more threads to enter the subsequent !$OMP DO, perform the call to Ematrix_loop, and reach into the point where it is modifying GMatrix. This may have occurred prior to the slowest starting thread to reach Call Zero(GMatrix). IOW a zerowing occurring after other threads modified GMatrix.

John's comment on use of the Reduction clause is also pertinent, as the results indicate, but I caution you that use of reduction should be made with full knowledge of what will be required. In the above case, each thread will have a hidden copy of Gmatrix(10,10). This matrix is rather small, so it is somewhat inconsequential to have nThread number of copies of the Gmatrix. Consider the situation where you may be performing a calculation using arrays that (single threadedly) push the memory limit of your system. In those cases you do not want to have nThread copies of the very large array(s). Instead, a typical solution to these problems is to tile, and sub-tile the arrays for cache efficiency. In performing the tile, sub-tile, you may be able to eliminate the requirement for reduction .OR. at least make the reduction portions of the array to the smaller size of the sub-tile.

Jim Dempsey

John_Campbell · ‎11-21-2016

Jim,

One thing that always puzzles me is line 17 of my post #3 : Gmatrix = 0.

If I leave it out, I get an incorrect result; If I place it in the !$OMP region, I again get an incorrect result.

My expectation was that each thread's REDUCTION variable should be zeroed automatically and each thread's private variable should have the value of the copied variable prior to entering the OMP region. However there appears to be a lack of certainty of the automatically initialised values. In my testing, if I remove line 17 or place it at call zero of the original post, then the results appear invalid.

Is this my lack of understanding or is there a problem with the definition of initial values for local/private variables.

John

Steven_L_Intel1 · ‎11-22-2016

"Zerored automatically"? No such thing. The point of the REDUCTION clause is to improve the parallel code for the reduction - it doesn't exempt you from writing correct code.

jimdempseyatthecove · ‎11-22-2016

RE: line 17 of my post #3 : Gmatrix = 0.

>>If I leave it out, I get an incorrect result;

Gmatrix then is not initialized to 0 as Call Zero is commented out.

>>My expectation was that each thread's REDUCTION variable should be zeroed automatically

The thread copy of the reduction variable is initialized, in the case of + to zero, however, the shared copy is left as-was and the operator of the reduction is applied to what it was upon exit of each thread from the region.l

>> and each thread's private variable should have the value of the copied variable prior to entering the OMP region.

Not so, private variables are uninitialized. Use firstprivate or copyin to initialize private variables.

Considering the above...

>>If I place it in the !$OMP region, I again get an incorrect result

Then this is redundant, and it does not initialize the out of region Gmatrix.

Jim Dempsey

jimdempseyatthecove · ‎11-22-2016

>>Use firstprivate or copyin to initialize private variables.

... to the same value as that of outside the parallel region.

*** Note, your above code is not to use firstprivate. Rather initialize Gmatrix prior to the parallel region.

Jim Dempsey