Fortran + OpenMP + Thread checker

fah10 · ‎12-08-2009

[cpp]According to Intel's thread checker, the attached Fortran+OpenMP code produces a write->write data-race. Can anyone say why this is happening? I don't understand why
the race condition is only produced in the v-variable in the subroutine and not in the other cases. Is this a problem with
the code, the compiler or the thread checker???

Fabian



program iomp_itt
   implicit none

   type tvector
      real(kind=8), dimension(:), pointer :: u,v
   end type tvector

   type tgrid
      type(tvector) :: t
   end type tgrid

   type(tgrid) :: grid
   integer(kind=4) :: i
   
   allocate(grid%t%u(1:26))
   allocate(grid%t%v(1:26))

   !$omp parallel do shared(grid)
   do i=1,26
      ! no data race 
      grid%t%u(i) = 0.0_8
      ! no data race 
      grid%t%v(i) = 0.0_8
   end do
   !$omp end parallel do
   
   call test(grid)

   deallocate(grid%t%u)
   deallocate(grid%t%v)

contains

   subroutine test(g)
      type(tgrid) :: g
      integer(kind=4) :: i

      !$omp parallel do shared(g)
      do i=1,26
         ! no data race 
         g%t%u(i) = 0.0_8
         ! the following line produces a write->write data-race
         g%t%v(i) = 0.0_8
      end do
      !$omp end parallel do

  end subroutine test

end program iomp_itt
[/cpp]

jimdempseyatthecove · ‎12-08-2009

Asside from the fact that your loop and work contained are too small to take advantage of parallization, one would expect thatboth loops would exhibit the same behavior. In the first loop grid is a local array, in the second g is a passed descriptor of grid (which generates a copy of the descriptor of grid). This may be a case of the unLuck of the draw.

As an experiment, make the iteration space (26)such that when divided by the number of threads is a multiple of 8 (8 real(kind=8) fit in a cache line).

2 threads 32
3 threads 48
4 threads 32
5 threads 40
6 threads 48
7 threads 56
8 threads 64

This will give you the smallest number of iterations yet when divided up amongst the threads, each thread is isolated within cache lines.

Your test program might be of interest to Intel.

Jim Dempsey

fah10 · ‎12-09-2009

The problem doesn't seem to depend on the length on the array. However, I figured out that Intel's thread checker is only complaining about the data race when the program was compiled with -tcheck
Just compiling without -tcheck is also not an option because then the thread checker complains about data races when allocating a private array inside a parallel region. It's really a mess with OpenMP... :-(

TimP · ‎12-09-2009

I've seen a few anomalies with -tcheck which have been corrected in recent compilers. Going even further back, in case you are using a very old compiler, at one time -tcheck didn't set the other options it requires, such as debug symbols.
I've also received hints that major enhancements to support Parallel Studio have put some of the work on Fortran tcheck on hold.