Reduction on zero-trip loop, error in Intel Fortran/OpenMP impl

tjahns · ‎10-11-2011

Hello,

I've already put this issue up on the OpenMP forum, but did not get any comments as to the validity of my code. Hence I might have a parallelization bug here that only ifort exposes, but I think that's unlikely. So here goes:

I'm trying to build a Fortran program that builds some control sums over an array, where the trip count of the loops might be zero, although they are typically rather large. I have an implementation with OpenMP reduction on the do loops which unfortunately gives incorrect sums on zero-trip loops.

I built the attached program with Intel ifort 12.1.0 20110811, PGI pgf95 11.8-0 64-bit target on x86-64 Linux -tp penryn and gfortran, gfortran and pgf95 yield the expected result for sum_b from the second, zero-trip loop. ifort gives some bogus value. I wanted to ask wether the attached program has a bug I'm not yet aware of, before I go through our support chain to file a bug report with Intel.

If one changes n to 0 in the program header, the first loop also gives unpredictable results for sum_total.

I'd appreciate any comment.

The compilation commands I used were:

$ ifort -O0 -g -openmp -o ompzerotripreduction ompzerotripreduction.f90

upon running the binary I get the following output:


$ OMP_NUM_THREADS=1 ./ompzerotripreduction
 sum_total=                     0 sum_b=                     0
 n=          13 m=           0
 sum_total=                    13 sum_b=       140193825868784

i.e. only 1 thread is started, which has the sum variables initialized to 0 but after going through an openmp do loop with zero iterations and a reduction clause on the corresponding sum_b variable, the variable value is rather unexpectedly different from 0, the expected result.

For pgf95 and gfortran I get the expected sum of 0 for sum_b:

$ pgf95 -mp -g -O0 -o ompzerotripreduction ompzerotripreduction.f90 && ./ompzerotripreduction
 sum_total=            0 sum_b=            0
 n=     13 m=      0
 sum_total=           13 sum_b=            0
$ gfortran -O0 -g -fopenmp -o ompzerotripreduction ompzerotripreduction.f90 && OMP_NUM_THREADS=1 ./ompzerotripreduction
 sum_total=          0 sum_b=          0
 n=     13 m=     0
 sum_total=         13 sum_b=          0

The program source is as follows:

[fortran]PROGRAM zerotripreduction
  INTEGER, PARAMETER :: i8=SELECTED_INT_KIND(14)
  INTEGER(i8) :: sum_total, sum_b
  INTEGER, ALLOCATABLE :: a(:)
  INTEGER :: n, m

!$omp parallel shared(a, sum_total, sum_b, n, m)
!$omp master
  n = 13

  ALLOCATE(a(n))

  m = MAX(0, n - 500)
  sum_total = 0_i8
  sum_b = 0_i8
  a = 1
!$omp end master
!$omp barrier
  PRINT *, 'sum_total=', sum_total, 'sum_b=', sum_b
!$omp barrier
!$omp do reduction(+: sum_total)
  DO i = 1, n
    sum_total = sum_total + a(i)
  END DO
!$omp end do
!$omp do reduction(+: sum_b)
  DO i = 1, m
    sum_b = sum_b + a(i)
  END DO
!$omp end do
!$omp master
  PRINT *, 'n=', n, 'm=', m
  PRINT *, 'sum_total=', sum_total, 'sum_b=', sum_b
!$omp end master
!$omp end parallel
END PROGRAM zerotripreduction[/fortran]

jimdempseyatthecove · ‎10-11-2011

Try placing an !$omp barrier in front of your last !$omp master.
(or moving what is inside the !$omp master outside of the parallel region (after implied barrier)).

Jim Dempsey

tjahns · ‎10-12-2011

!$omp end do
also has an implied barrier, and therefore an explicit barrier should not add anything here.
To make sure I did insert an omp barrier statement after both loops and it changed nothing.

TimP · ‎10-12-2011

All that complication is distracting, but I find this case can be much simplified and still exhibit the bug. All the stuff with barriers appears to be irrelevant. The bug doesn't go away until I combine omp parallel and omp do reduction into a single omp parallel do, or add something like if(max(m,n)>256) to avoid wasting time on OpenMP when the loop is too short. I suppose there isn't much testing of OpenMP with pure integer cases. I posted issue 647010 on premier.intel.com, but you would do better to post one yourself if you can explain why such a case should get priority.

jimdempseyatthecove · ‎10-12-2011

>>The bug doesn't go away until I combine omp parallel and omp do reduction into a single omp parallel do

How about moving the reduction clause from the !$omp do... to the !$omp parallel...
(provided the code will not use the final result of the reduction prior to exit of the !$omp parallel... region)

The example code did show use of the reduction variable within the parallel region.
For this case, the work around may be to explicitly reduce through an atomic clause (outside the !$omp do...), followed by barrier before use.

Jim Dempsey

tjahns · ‎10-12-2011

In the original program the parallel region is much longer, therefore it would be rather inconvenient to stop and restart the threads in the middle of the action.

tjahns · ‎10-12-2011

Thanks for reporting this as a bug. The priority is entirely in your hand, since I now know a workaround, I'll just add

[fortran]#ifdef __INTEL_COMPILER
   if (m > 0) then
#endif
[...]
#ifdef __INTEL_COMPILER
   endif
#endif[/fortran]

around code triggering the bug.

pbkenned1 · ‎10-12-2011

tjahns, thanks for reporting the issue. Your original code is a correct OpenMP program. And Tim, thanks for exposing the core issue and reporting the bug on Premier.

We'll investigate and follow up.

Patrick Kennedy

TimP · ‎10-12-2011

Creating 2 separate parallel regions doesn't necessarily impact performance. libiomp5 keeps the threads active for the time interval KMP_BLOCKTIME (default 200 ms) so as to speed startup of the new parallel region.
Thanks to Pat for taking on this issue.

pbkenned1 · ‎10-12-2011

Setting KMP_BLOCKTIME=infinite will keep threads spinning 'forever' and minimize the performance impact. GCC does that by default. It's not fair to other processes, but perhaps you don't care about that.

Patrick

pbkenned1 · ‎10-12-2011

This defect has been reported to compiler engineering as 'ifort 12.1 OpenMP incorrect sum reduction for zero trip-count parallel do in a parallel region'. The bug does exist in 12.1 and various 12.0 versions of ifort on Windows and Linux; Mac OS not tested.

Tracking number: DPD200174677

I'll keep this thread updated on the progress to repair.

The last known good compiler is Version 12.0.5.220 Build 20110719 (Linux update #5).

Regression history: broken in 12.0.4.191 (and earlier) => OK in 12.0.5.220 => broken again in 12.1.0.233

Patrick

pbkenned1 · ‎05-04-2012

The issue is resolved Intel Fortran Composer XE 2011 Update 10, e.g, Version 12.1.4.319, Build 20120410 (Linux).

> ifort -V

Intel Fortran Intel 64 Compiler XE for applications running on Intel 64, Version 12.1.4.319 Build 20120410

> ifort -openmp zerotripomp.f && ./a.out

sum_total= 0 sum_b= 0

n= 13 m= 0

sum_total= 13 sum_b= 0

>

Patrick

Reduction on zero-trip loop, error in Intel Fortran/OpenMP implementation?