- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp]According to Intel's thread checker, the attached Fortran+OpenMP code produces a write->write data-race. Can anyone say why this is happening? I don't understand why
the race condition is only produced in the v-variable in the subroutine and not in the other cases. Is this a problem with
the code, the compiler or the thread checker???
Fabian
program iomp_itt implicit none type tvector real(kind=8), dimension(:), pointer :: u,v end type tvector type tgrid type(tvector) :: t end type tgrid type(tgrid) :: grid integer(kind=4) :: i allocate(grid%t%u(1:26)) allocate(grid%t%v(1:26)) !$omp parallel do shared(grid) do i=1,26 ! no data race grid%t%u(i) = 0.0_8 ! no data race grid%t%v(i) = 0.0_8 end do !$omp end parallel do call test(grid) deallocate(grid%t%u) deallocate(grid%t%v) contains subroutine test(g) type(tgrid) :: g integer(kind=4) :: i !$omp parallel do shared(g) do i=1,26 ! no data race g%t%u(i) = 0.0_8 ! the following line produces a write->write data-race g%t%v(i) = 0.0_8 end do !$omp end parallel do end subroutine test end program iomp_itt [/cpp]
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Asside from the fact that your loop and work contained are too small to take advantage of parallization, one would expect thatboth loops would exhibit the same behavior. In the first loop grid is a local array, in the second g is a passed descriptor of grid (which generates a copy of the descriptor of grid). This may be a case of the unLuck of the draw.
As an experiment, make the iteration space (26)such that when divided by the number of threads is a multiple of 8 (8 real(kind=8) fit in a cache line).
2 threads 32
3 threads 48
4 threads 32
5 threads 40
6 threads 48
7 threads 56
8 threads 64
This will give you the smallest number of iterations yet when divided up amongst the threads, each thread is isolated within cache lines.
Your test program might be of interest to Intel.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem doesn't seem to depend on the length on the array. However, I figured out that Intel's thread checker is only complaining about the data race when the program was compiled with -tcheck
Just compiling without -tcheck is also not an option because then the thread checker complains about data races when allocating a private array inside a parallel region. It's really a mess with OpenMP... :-(
Just compiling without -tcheck is also not an option because then the thread checker complains about data races when allocating a private array inside a parallel region. It's really a mess with OpenMP... :-(
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've seen a few anomalies with -tcheck which have been corrected in recent compilers. Going even further back, in case you are using a very old compiler, at one time -tcheck didn't set the other options it requires, such as debug symbols.
I've also received hints that major enhancements to support Parallel Studio have put some of the work on Fortran tcheck on hold.
I've also received hints that major enhancements to support Parallel Studio have put some of the work on Fortran tcheck on hold.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page