- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Recently, I downloaded Parallel Studio XE 2011. I ran Inspector XE
on my scientific simulation code in order to find OpenMP data races.
The Inspector reported many race conditions at Fortran allocate/deallocate
statements within parallel regions. However, trying to figure out which
allocate is declared as data race, it turned out that the behavior doesn't
seem to be deterministic.
So I came up with this trivial test problem:
[fortran]program omp_alloc implicit none integer, dimension(:), allocatable :: a !$omp parallel private(a) allocate(a(10)) deallocate(a) !$omp end parallel end program omp_alloc[/fortran]
> ifort --version
ifort (IFORT) 12.0.2 20110112
Copyright (C) 1985-2011 Intel Corporation. All rights reserved.
> ifort -openmp -g omp_alloc.F90
> inspxe-cl --version
Intel Inspector XE 2011 Update 2 (build 134657) Command Line tool
Copyright (C) 2009-2011 Intel Corporation. All rights reserved.
> export OMP_NUM_THREADS=4
> inspxe-cl -collect ti3 -- ./a.out
Used suppression file(s): []
1 new problem(s) found
1 Data race problem(s) detected
Inspecting the data race with the GUI, it points to the allocate/deallocate statments.
The data race is not reported in a deterministic fashion. However, increasing the
number of threads over the number of available cores drastically increases the
probability for the data race to be reported.
What is going on here? Is that really a data race or just a false positive?
Fabian
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Results of Inspector XE make sense. It's a really data race in my view.
Since variable "a" is shared between omp threads, but conflicted when doing "allocate" - they are write-write conflicts.
If you increase number of OMP_NUM_THREADS, problem number still is "1" - but there are more code locationsin more threads.
Regards, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So there shouln't be a data race...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let me talk with engineering team, and get back to you soon.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unless a subroutine or function is declared as recursive, declaring an array descriptor as if the descriptor were SAVE. The allocation/deallocation from heap is a seperate issue. PRIVATE is used in a parallel region for array a(:). This instantiates thread team number of additional, private, array descriptors, including one for the main thread. When the main thread exits the parallel region, it will copy the contents of its private variables back to the scoped varibles outside the parallel region. Even though the private a(:) of the main threashas now been deallocated, and is presumably identical to the contents of the outer scoped a(:), the compiler doesn't know/care and code is generated to write the contents of the array descriptor back out. If one of the other thread team members be in the startup section of the parallel region at this time thenit is possible that a non-main thread isreading the outer scoped contents of the array descriptor while the main thread is re-writing it.
At least this is my postulation asto what is going on.
In this case, this is a benign issue (one thread writing a null array descriptor while other threads reading that null array descriptor).
To correct for this (if you want to do this), move the code from within the parallel region to a subroutine attributed with RECURSIVE, then in the parallel region call this subroutine.The recursive attribute will cause the array descriptor to be located on the stack (for each thread). As to if this works into the poster's code, I cannot say, as it may require passing a large number of references.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
program omp_alloc
integer XX = 16
!$omp parallel
call sub(X)
!$omp end parallelend program omp_alloc
recursive subroutine sub(X)
implicit none
integer, dimension(:), allocatable ::a
integer Xallocate(a(10))
deallocate(a)if (X .GT. 0) then
call sub(X-1)
end ifend
[root@NHM02 problem_report]# ifort -g -openmp -openmp-report omp_alloc.F90 -o omp_alloc.ifort
omp_alloc.F90(5) (col. 7): remark: OpenMP DEFINED REGION WAS PARALLELIZED.
However, it still detected data race-
[root@NHM02 problem_report]# inspxe-cl -collect ti3 -- ./omp_alloc.ifort
Used suppression file(s): []
2 new problem(s) found
1 Cross-thread stack access problem(s) detected
1 Data race problem(s) detected
Even I inserted "call sleep(5)" between allocate and deallocate, the result was same.
I will verify this with Inspetcor XE development team.
Regards, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page