Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

Debugging OpenMP

fah10
New Contributor I
1,371 Views
Hi folks,

Recently, I downloaded Parallel Studio XE 2011. I ran Inspector XE
on my scientific simulation code in order to find OpenMP data races.
The Inspector reported many race conditions at Fortran allocate/deallocate
statements within parallel regions. However, trying to figure out which
allocate is declared as data race, it turned out that the behavior doesn't
seem to be deterministic.

So I came up with this trivial test problem:

[fortran]program omp_alloc
   implicit none

   integer, dimension(:), allocatable :: a

   !$omp parallel private(a) 
   allocate(a(10))
   deallocate(a)
   !$omp end parallel
end program omp_alloc[/fortran]

> ifort --version
ifort (IFORT) 12.0.2 20110112
Copyright (C) 1985-2011 Intel Corporation. All rights reserved.
> ifort -openmp -g omp_alloc.F90
> inspxe-cl --version
Intel Inspector XE 2011 Update 2 (build 134657) Command Line tool
Copyright (C) 2009-2011 Intel Corporation. All rights reserved.
> export OMP_NUM_THREADS=4
> inspxe-cl -collect ti3 -- ./a.out
Used suppression file(s): []

1 new problem(s) found
1 Data race problem(s) detected

Inspecting the data race with the GUI, it points to the allocate/deallocate statments.
The data race is not reported in a deterministic fashion. However, increasing the
number of threads over the number of available cores drastically increases the
probability for the data race to be reported.

What is going on here? Is that really a data race or just a false positive?


Fabian


0 Kudos
6 Replies
Peter_W_Intel
Employee
1,371 Views

Results of Inspector XE make sense. It's a really data race in my view.

Since variable "a" is shared between omp threads, but conflicted when doing "allocate" - they are write-write conflicts.

If you increase number of OMP_NUM_THREADS, problem number still is "1" - but there are more code locationsin more threads.

Regards, Peter

0 Kudos
fah10
New Contributor I
1,371 Views
"a" is declared private, so each thread thread has its own copy of the variable and allocates its own memory on the heap.
So there shouln't be a data race...
0 Kudos
Peter_W_Intel
Employee
1,371 Views
"a"was declared in main programand without "common a", should beNOT a global. Also "a was declared as "private" in omp directive.

Let me talk with engineering team, and get back to you soon.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,371 Views
Peter,

Unless a subroutine or function is declared as recursive, declaring an array descriptor as if the descriptor were SAVE. The allocation/deallocation from heap is a seperate issue. PRIVATE is used in a parallel region for array a(:). This instantiates thread team number of additional, private, array descriptors, including one for the main thread. When the main thread exits the parallel region, it will copy the contents of its private variables back to the scoped varibles outside the parallel region. Even though the private a(:) of the main threashas now been deallocated, and is presumably identical to the contents of the outer scoped a(:), the compiler doesn't know/care and code is generated to write the contents of the array descriptor back out. If one of the other thread team members be in the startup section of the parallel region at this time thenit is possible that a non-main thread isreading the outer scoped contents of the array descriptor while the main thread is re-writing it.

At least this is my postulation asto what is going on.

In this case, this is a benign issue (one thread writing a null array descriptor while other threads reading that null array descriptor).

To correct for this (if you want to do this), move the code from within the parallel region to a subroutine attributed with RECURSIVE, then in the parallel region call this subroutine.The recursive attribute will cause the array descriptor to be located on the stack (for each thread). As to if this works into the poster's code, I cannot say, as it may require passing a large number of references.

Jim Dempsey
0 Kudos
Peter_W_Intel
Employee
1,371 Views
Thanks Jim, for helpful info! Now I change code as below:


program omp_alloc
integer X

X = 16
!$omp parallel
call sub(X)
!$omp end parallel

end program omp_alloc

recursive subroutine sub(X)
implicit none
integer, dimension(:), allocatable ::a
integer X

allocate(a(10))
deallocate(a)

if (X .GT. 0) then
call sub(X-1)
end if

end

[root@NHM02 problem_report]# ifort -g -openmp -openmp-report omp_alloc.F90 -o omp_alloc.ifort
omp_alloc.F90(5) (col. 7): remark: OpenMP DEFINED REGION WAS PARALLELIZED.

However, it still detected data race-
[root@NHM02 problem_report]# inspxe-cl -collect ti3 -- ./omp_alloc.ifort
Used suppression file(s): []

2 new problem(s) found
1 Cross-thread stack access problem(s) detected
1 Data race problem(s) detected

Even I inserted "call sleep(5)" between allocate and deallocate, the result was same.

I will verify this with Inspetcor XE development team.

Regards, Peter

0 Kudos
Peter_W_Intel
Employee
1,371 Views
This problem has been fixed in latest Update 8
0 Kudos
Reply