Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29261 Discussions

Fortran + OpenMP + Thread checker

fah10
New Contributor I
724 Views
[cpp]According to Intel's thread checker, the attached Fortran+OpenMP code produces a write->write data-race. Can anyone say why this is happening? I don't understand why
the race condition is only produced in the v-variable in the subroutine and not in the other cases. Is this a problem with
the code, the compiler or the thread checker???

Fabian



program iomp_itt implicit none type tvector real(kind=8), dimension(:), pointer :: u,v end type tvector type tgrid type(tvector) :: t end type tgrid type(tgrid) :: grid integer(kind=4) :: i allocate(grid%t%u(1:26)) allocate(grid%t%v(1:26)) !$omp parallel do shared(grid) do i=1,26 ! no data race grid%t%u(i) = 0.0_8 ! no data race grid%t%v(i) = 0.0_8 end do !$omp end parallel do call test(grid) deallocate(grid%t%u) deallocate(grid%t%v) contains subroutine test(g) type(tgrid) :: g integer(kind=4) :: i !$omp parallel do shared(g) do i=1,26 ! no data race g%t%u(i) = 0.0_8 ! the following line produces a write->write data-race g%t%v(i) = 0.0_8 end do !$omp end parallel do end subroutine test end program iomp_itt [/cpp]
0 Kudos
3 Replies
jimdempseyatthecove
Honored Contributor III
724 Views

Asside from the fact that your loop and work contained are too small to take advantage of parallization, one would expect thatboth loops would exhibit the same behavior. In the first loop grid is a local array, in the second g is a passed descriptor of grid (which generates a copy of the descriptor of grid). This may be a case of the unLuck of the draw.

As an experiment, make the iteration space (26)such that when divided by the number of threads is a multiple of 8 (8 real(kind=8) fit in a cache line).

2 threads 32
3 threads 48
4 threads 32
5 threads 40
6 threads 48
7 threads 56
8 threads 64

This will give you the smallest number of iterations yet when divided up amongst the threads, each thread is isolated within cache lines.

Your test program might be of interest to Intel.

Jim Dempsey
0 Kudos
fah10
New Contributor I
724 Views
The problem doesn't seem to depend on the length on the array. However, I figured out that Intel's thread checker is only complaining about the data race when the program was compiled with -tcheck
Just compiling without -tcheck is also not an option because then the thread checker complains about data races when allocating a private array inside a parallel region. It's really a mess with OpenMP... :-(
0 Kudos
TimP
Honored Contributor III
724 Views
I've seen a few anomalies with -tcheck which have been corrected in recent compilers. Going even further back, in case you are using a very old compiler, at one time -tcheck didn't set the other options it requires, such as debug symbols.
I've also received hints that major enhancements to support Parallel Studio have put some of the work on Fortran tcheck on hold.
0 Kudos
Reply