Community
cancel
Showing results for
Did you mean:
Highlighted
Beginner
40 Views

## OpenMP Do Loop syncronization problem

I'm new to OMP. I'm trying to code a simple matrix problem in Fortran by OpenMP which is used for just a calling process. I've three subroutines, one for taking initial value and the others are matrix calculation. I think, I've a syncronization problem in matrix calculation because in some case (for ex. for some thread number) matrix value is not same with the desired last loop step. Generally its some values belong the previous step. Any suggetion?

```program Matrix_E
Real :: Ematrix(10,10)
Real :: Gmatrix(10,10)
Write (6,*) 'input number of elements:'
Write (6,*) 'input number of processers:'

!---------- parallel region--------------
!\$OMP PARALLEL SHARED(numel,  Gmatrix) PRIVATE(i, EMatrix)
Call Zero (Gmatrix)  ! For initial, Members of Matrix are equal to zero for all threads.

!\$OMP DO SCHEDULE(STATIC)
Do i = 1,numel  !Each thread calculates EMatrix for an element.
Call Ematrix_Loop (Ematrix, Gmatrix)
End Do
!\$OMP END PARALLEL
!----------End of parallel region--------
Write (6,*) GMatrix
End program Matrix_E  !End of main program
!-------------------------

Subroutine Ematrix_Loop (d_ematrix, d_Gmatrix)  !In this subroutine all members of EMatrix are equiled to 1 and sleep function is used to time wasting.
implicit none
Integer :: K,J
Real, INTENT(OUT):: d_EMatrix(10,10)
Real :: d_GMatrix(10,10)
Do K = 1,10 ! Outer loop - right index
Do J=1,10 ! Inner loop - left index
d_EMatrix(J,K) = 1
Call sleepqq (1)
End Do
End Do
Call Gmatrix_Loop (d_EMatrix, d_GMatrix)
End subroutine Ematrix_Loop

Subroutine Gmatrix_Loop (dd_EMatrix, dd_GMatrix)   !This subroutine calculates Gmatrix which is matrix summation of EMatrix.
implicit none
Integer :: ii, jj
Real, INTENT(IN):: dd_EMatrix(10,10)
Real, INTENT(OUT):: dd_GMatrix(10,10)
DO 120 ii=1, 10 ! Outer loop - right index
DO 130 jj=1, 10 ! Inner loop - left index
dd_GMatrix(jj,ii)= dd_GMatrix(jj,ii) + dd_EMatrix(jj,ii)
130     Continue
120     Continue
End subroutine Gmatrix_Loop

Subroutine Zero (z_Gmatrix)  !This subroutine makes zero all members for initial value of Matrices.
implicit none
Integer :: K,J
Real :: z_GMatrix(10,10)
Real :: z_EMatrix(10,10)
Do K = 1,10 ! Outer loop - right index
Do J=1,10 ! Inner loop - left index
z_GMatrix(J,K) = 0
z_Ematrix(j,k) = 0
End Do
End Do
End subroutine Zero ```
Tags (1)

Accepted Solutions
Highlighted
New Contributor II
40 Views

You need to indicate Gmatrix is being modified by all threads, so a private version must be accumulated. I also "initialised" Gmatrix outside the !\$OMP region.

the following should work

```  program Matrix_E
Real :: Ematrix(10,10)
Real :: Gmatrix(10,10)
INTEGER  numel,numproc, i, j
!
numel = 30
numproc = 4
Write (6,*) 'input number of elements:', numel
Write (6,*) 'input number of processers:', numproc
!
!
gmatrix = 0
!---------- parallel region--------------
!\$OMP PARALLEL DO SHARED(numel) PRIVATE(i, EMatrix) REDUCTION(+ : Gmatrix) SCHEDULE(STATIC)
!
!    Call Zero (Gmatrix)  ! For initial, Members of Matrix are equal to zero for all threads.
!
!     !!OMP DO SCHEDULE(STATIC)
Do i = 1,numel  !Each thread calculates EMatrix for an element.
!\$      write (*,*) i, OMP_GET_THREAD_NUM ()
Call Ematrix_Loop (Ematrix, Gmatrix)
End Do
!\$OMP END PARALLEL DO
!----------End of parallel region--------
do i = 1,10
Write (6,*) i, GMatrix(:,i)
end do
End program Matrix_E  !End of main program
```

7 Replies
Highlighted
Black Belt
40 Views
```!---------- parallel region--------------
!\$OMP PARALLEL SHARED(numel,  Gmatrix) PRIVATE(i, EMatrix)
Call Zero (Gmatrix)  ! For initial, Members of Matrix are equal to zero for all threads.
```

The data-sharing attribute for GMatrix is shared, but all threads in the team are modifying its entirety here (and in a later procedure call).  This isn't permitted.  Perhaps GMatrix should be PRIVATE?

In subroutine Zero, you define the value of a local variable called z_Ematrix to zero.  This is pointless, because that local variable isn't used.  Perhaps you meant that local variable to be a dummy argument?

Fortran 90's array operations would simplify your example code.

Highlighted
New Contributor II
41 Views

You need to indicate Gmatrix is being modified by all threads, so a private version must be accumulated. I also "initialised" Gmatrix outside the !\$OMP region.

the following should work

```  program Matrix_E
Real :: Ematrix(10,10)
Real :: Gmatrix(10,10)
INTEGER  numel,numproc, i, j
!
numel = 30
numproc = 4
Write (6,*) 'input number of elements:', numel
Write (6,*) 'input number of processers:', numproc
!
!
gmatrix = 0
!---------- parallel region--------------
!\$OMP PARALLEL DO SHARED(numel) PRIVATE(i, EMatrix) REDUCTION(+ : Gmatrix) SCHEDULE(STATIC)
!
!    Call Zero (Gmatrix)  ! For initial, Members of Matrix are equal to zero for all threads.
!
!     !!OMP DO SCHEDULE(STATIC)
Do i = 1,numel  !Each thread calculates EMatrix for an element.
!\$      write (*,*) i, OMP_GET_THREAD_NUM ()
Call Ematrix_Loop (Ematrix, Gmatrix)
End Do
!\$OMP END PARALLEL DO
!----------End of parallel region--------
do i = 1,10
Write (6,*) i, GMatrix(:,i)
end do
End program Matrix_E  !End of main program
```

Highlighted
Black Belt
40 Views

f y,

A little expansion on IanH's comments.

In your original post, the primary reason the Call Zero(GMatrix) was misplaced is .NOT. necessarily that you were performing redundant work (each thread zeroing the array), rather that each thread may perform that call at (slightly) different times, and that it is possible for one or more threads to enter the subsequent !\$OMP DO, perform the call to Ematrix_loop, and reach into the point where it is modifying GMatrix. This may have occurred prior to the slowest starting thread to reach Call Zero(GMatrix). IOW a zerowing occurring after other threads modified GMatrix.

John's comment on use of the Reduction clause is also pertinent, as the results indicate, but I caution you that use of reduction should be made with full knowledge of what will be required. In the above case, each thread will have a hidden copy of Gmatrix(10,10). This matrix is rather small, so it is somewhat inconsequential to have nThread number of copies of the Gmatrix. Consider the situation where you may be performing a calculation using arrays that (single threadedly) push the memory limit of your system. In those cases you do not want to have nThread copies of the very large array(s). Instead, a typical solution to these problems is to tile, and sub-tile the arrays for cache efficiency. In performing the tile, sub-tile, you may be able to eliminate the requirement for reduction .OR. at least make the reduction portions of the array to the smaller size of the sub-tile.

Jim Dempsey

Highlighted
New Contributor II
40 Views

Jim,

One thing that always puzzles me is line 17 of my post #3 : Gmatrix = 0.

If I leave it out, I get an incorrect result; If I place it in the !\$OMP region, I again get an incorrect result.

My expectation was that each thread's REDUCTION variable should be zeroed automatically and each thread's private variable should have the value of the copied variable prior to entering the OMP region. However there appears to be a lack of certainty of the automatically initialised values.  In my testing, if I remove line 17 or place it at call zero of the original post, then the results appear invalid.

Is this my lack of understanding or is there a problem with the definition of initial values for local/private variables.

John

Highlighted
Employee
40 Views

"Zerored automatically"? No such thing. The point of the REDUCTION clause is to improve the parallel code for the reduction - it doesn't exempt you from writing correct code.

Highlighted
Black Belt
40 Views

RE: line 17 of my post #3 : Gmatrix = 0.

>>If I leave it out, I get an incorrect result;

Gmatrix then is not initialized to 0 as Call Zero is commented out.

>>My expectation was that each thread's REDUCTION variable should be zeroed automatically

The thread copy of the reduction variable is initialized, in the case of + to zero, however, the shared copy is left as-was and the operator of the reduction  is applied to what it was upon exit of each thread from the region.l

>> and each thread's private variable should have the value of the copied variable prior to entering the OMP region.

Not so, private variables are uninitialized. Use firstprivate or copyin to initialize private variables.

Considering the above...

>>If I place it in the !\$OMP region, I again get an incorrect result

Then this is redundant, and it does not initialize the out of region Gmatrix.

Jim Dempsey

Highlighted
Black Belt
40 Views

>>Use firstprivate or copyin to initialize private variables.

... to the same value as that of outside the parallel region.

*** Note, your above code is not to use firstprivate. Rather initialize Gmatrix prior to the parallel region.

Jim Dempsey