Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Another problem with OpenMP

jirina
New Contributor I
639 Views
I have a code which reads
[cpp]!$omp parallel if ( enableOpenMP ) num_threads ( threads ) default ( shared )
!$omp& firstprivate ( i, im1, ip1, istep, i1, i2,
!$omp& j, jm1, jp1, jstep, j1, j2,
!$omp& k, km1, kp1, kstep, k1, k2, order, bm, mm )
!$omp& reduction ( +: have, rhave, objemh, hfave, rhfave, objemhf,
!$omp& tsave, rtsave, surft, qsave, rqsave )
!$omp& reduction ( MAX: hmax, rhmax, hfmax, rhfmax, tsmax, rtsmax, qsmax, rqsmax )
!$omp& reduction ( MIN: hmin, hfmin, tsmin, qsmin )

!$omp do schedule(dynamic,3)
DO j=jbls2,jbl1,-1
jm1 = j-1
jp1 = j+1

call random_dir ( ibl1, ibl2, kbl1, kbl2, i1, i2, istep, k1, k2, kstep, order )

if ( order.ge.0.5 ) then

do k=k1,k2,kstep
km1 = k-1
kp1 = k+1
do i=i1,i2,istep
im1 = i-1
ip1 = i+1

bm = type(i,j,k)
if ( bm.eq.2 .OR. bm.eq.3 ) then
call h_solver ( h, t, tbx, tby, tbz,
+ u, v, w, cp, lam, rho, Source, s,
+ i, j, k, im1, jm1, km1, ip1, jp1, kp1,
+ type, rxbh, ih, jh, kh, b4_hgt,
+ hmax, hmin, rhmax, rhave, have, objemh )
else if ( bm.eq.4 ) then
call f_solver ( h, t, tbx, tby, tbz, cp, lam,
+ i, j, k, im1, jm1, km1, ip1, jp1, kp1,
+ spm, type, ihf, jhf, khf,
+ hfmax, hfmin, rhfmax, rhfave, hfave, objemhf )
else if ( bm.eq.-5 .OR. bm.eq.-4 .OR. bm.eq.-3 ) then
mm = spm(i,j,k)
call b_w ( t, tbx, tby, tbz, h, b4_hgt,
+ lam, cp, spm, type, itype, iplane, rxbsurf, i, j, k, mm,
+ im1, jm1, km1, ip1, jp1, kp1, its, jts, kts,
+ tsmax, tsmin, rtsmax, rtsave, tsave, surft,
+ iqs, jqs, kqs, qsmax, qsmin, rqsmax, rqsave, qsave )
endif

end do
end do

else
...
endif

END DO
!$omp end do
!$omp end parallel[/cpp]
If I run my program in the Release version, values of variables included in REDUCTION are correct (I compared them with values obtained using the same code, but with enableOpenMP = .false.).

I decided that FIRSTPRIVATE does not have to be used here, so I changed it to PRIVATE. Surprisingly, some of the variables from REDUCTION suddenly have either NaN or Infinity value.

I checked all variables by printing them out to the screen inside the parallel region's DO cycle for j together with the thread number and everything is OK. However, after the parallel region is finished, some values become either NaN or Infinity. This would indicate that there is something wrong happening with some of the REDUCTION variables.

Finally, when I added WRITE just after the parallel region, the values were correct again.

Does anybody have any idea what I could try to find the cause of this problem? I checked the number and type of arguments of subroutines h_solver, f_solver and b_w several times (those which are not listed in (FIRST)PRIVATE and REDUCTION are 3D allocatable arrays), I ran the Debug version with many of the Diagnostics options enabled, but I did not get any indication.
0 Kudos
5 Replies
jimdempseyatthecove
Honored Contributor III
639 Views

The variables, who's values from outside the parallel region are to be passed into the parallel region and then used (modified)privately must be either

FIRSTPRIVATE(yourVariableshere)
or
PRIVATE(yourVariableshere) COPYIN(yourVariableshere)

For variables, who's values from outside the parallel region are to be passed into the parallel region and then usedREADONLY should be declared shared.

Do not use FIRSTPRIVATE or COPYIN on variables that are written first (as you will be incurring additional overhead).

You must be aware that variables listed in REDUCTION are implicitly zeroed.

When a cell in an array is only written by one thread (i.e. a j index above) then there is no requirement for reduction.

Your code also may have a temporal issue with respect to j+1 and j-1 indecies. That is where the j-1 index of an array is assumed to have been written prior to the j index, and the j+1 index is assumed to have been written after the j index. Verify that your code is not sensitve in this respect. If it is, then you will have to correct for this.

Jim Dempsey
0 Kudos
jirina
New Contributor I
639 Views
I use DEFAULT(SHARED) when declaring the parallel region. And also, I remember from your posts in a thread on a similar topic the explanation of the function and the difference between FIRSTPRIVATE and PRIVATE+COPYIN.

I have already checked all declarations of parallel regions in my code so that the readonly variables are not included in FIRSTPRIVATE (they are shared, see above).

I am using REDUCTION on variables which are statistics - min, max, average. Screen outputs from my tests indicates that their initialization (not only zeroing - it is the smallest representable number in case of MAX) within particular threads is OK, their calculation and update for each of threads is OK; I am just having problems with the update of the shared variables at the end of the REDUCTION.

The core of the problem is strange - the update of REDUCTION variables is OK if I use FIRSTPRIVATE in the code shown in my original post. If I use PRIVATE, the update of REDUCTION variables fails and produces NaN and/or Infinity.

I have no idea what's wrong about using PRIVATE - all variables in the list are initialized inside the parallel region. In addition, I don't see any reason why these variables should influence the way how REDUCTION variables are updated.

There should be no problem with the j index. When it is calculated, only j index of all used arrays in written; j-1 and j+1 are array locations to read from. Also, I am not using REDUCTION for any array, just for scalar variables.
0 Kudos
jimdempseyatthecove
Honored Contributor III
639 Views

You will have to add some test code to verify the variables contain what you think they should contain.

Something like
[cpp]subroutine DEBUG_FP_CLASS(V)
  real(8) :: V
  integer :: JFP_CLASS

    JFP_CLASS = FP_CLASS(V)
  SELECT CASE (JFP_CLASS)
    CASE (0)
    	call DOSTOP('FOR_K_FP_SNAN')
    CASE(1)
    	call DOSTOP('FOR_K_FP_QNAN')
    CASE(2)
    	call DOSTOP('FOR_K_FP_POS_INF')
    CASE(3)
    	call DOSTOP('FOR_K_FP_NEG_INF')
    CASE(4)
	! OK - FOR_K_FP_POS_NORM
    CASE(5)
	! OK - FOR_K_FP_NEG_NORM
    CASE(6)
    	call DOSTOP('FOR_K_FP_POS_DENORM')
    CASE(7)
    	call DOSTOP('FOR_K_FP_NEG_DENORM')
    CASE(8)
	! OK - FOR_K_FP_POS_ZERO
    CASE(9)
	! OK - FOR_K_FP_NEG_ZERO
    CASE DEFAULT
    	call DOSTOP('FOR_K_FP_unknown')
  END SELECT
end   subroutine DEBUG_FP_CLASS

[/cpp]
The above is for REAL(8), change if you use REAL(4).

You will have to decide if positive and/or negative denormalized numbers are acceptible.

And add

[cpp]! common routine to perform CALL DOSTOP()
    SUBROUTINE DOSTOP(CVAR)
    CHARACTER*(*) CVAR

! place break point here
    WRITE(IOUERR,*) CVAR
    WRITE(*,*) CVAR

     STOP 'DOSTOP'
    RETURN
    END SUBROUTINE DOSTOP
[/cpp]


You may need to test the arguments being passed to the subroutines called in you parallel region as well as test the results comming back from the calls. Add a similar test for subscript ranges. And or other sanity checks.

Use conditional compilation directives (I prefer to use FPP conditionals as opposed to IVF conditional directives as they are easier to use and you have macro capability too)

If all error asserts call DOSTOP with a text message, then by placing the break point on the WRITE in DOSTOP you will have a common stopping point. Always compile DOSTOP with debugging enabled and optimizations turned off.
You can change the WRITE to pop-up a message box, your choice.

When you get an error you can set the "next executable statement" at the RETURN of DOSTOP and then step out of DOSTOP back into the routine with the error (or suspicious data). You can examine variables there and even reset the "next executable" to go back some steps and then re-walk through the code producing the error. The re-walk is quite handy and will help to find errors that a simple dump of variables won't find.

When you find the error 99.44% of the time it will be your fault. (at least that is my experience).

When using the FPP you can define a macro _ASSERT(x) or _ASSERT(x,m)

#define _ASSERT(x,m) if(.not. (x)) call DOSTOP(m)

Where_ASSERT is conditioned to your preference as to if you want __FILE__ and/or __LINE__, as well as text message. (embelish DOSTOP accordingly too).

Garbage-In-Garbage-Out

Jim Dempsey
0 Kudos
jirina
New Contributor I
639 Views
Jim,
Thank you for your intensive support; I appreciate it a lot, as always. I will use what you suggested in your most recent post when I need to check variables and their values. I was lucky to find the cause of my problem, but there was no chance that somebody else would find it, because I did not include the complete code and the problem might have been caused by the skipped code. This is the (very simplified) complete code:
[cpp]!$omp parallel if ( enableOpenMP .AND. omp_energy .AND. omp_section ) num_threads ( threads ) default ( shared )
!$omp& private ( ... )
!$omp& reduction ( ... )

!$omp sections
!$omp section

!$omp parallel if ( enableOpenMP .AND. omp_energy .AND. omp_do ) num_threads ( threads ) default ( shared )
!$omp& private ( ... )
!$omp& reduction ( ... )

!$omp do schedule(dynamic,3)
DO i=i1,i2
... ! reduction variables are calculated here
END DO
!$omp end do
!$omp end parallel

!$omp section

!$omp parallel if ( enableOpenMP .AND. omp_energy .AND. omp_do ) num_threads ( threads ) default ( shared )
!$omp& private ( ... )
!$omp& reduction ( ... )

!$omp do schedule(dynamic,3)
DO i=i2,i1,-1
... ! reduction variables are calculated here
END DO
!$omp end do
!$omp end parallel

!$omp end sections
!$omp end parallel[/cpp]
This code comes from times I was comparing SECTIONS and DO, so I had two logical variables omp_section and omp_do. I am using omp_do = .true. and omp_section = .false. now, so the code corresponding to parallel SECTIONS is not needed at all. I was surprised to see my program work correctly after removing the lines used for defining SECTIONS - problems assumed to be connected with PRIVATE vs. FIRSTPRIVATE are gone.
0 Kudos
jimdempseyatthecove
Honored Contributor III
639 Views

Eventhough the sections was disabled (i.e. did not spawn a team of threads) the reduction portion of the nested parallel region was likely executed. If any of the threads, between the outer parallel region and the inner parallel region (sections) built a partial result in the reduction variable(s) those values would have been re-initialized for the next inner nested level. This may or may not have been related to your problem.

REDUCTION at any nested level implies initialization (usually 0 but could be +HUGE(var) or -HUGE(var) when MIN and MAX are used)

Jim Dempsey
0 Kudos
Reply