- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I am trying to parallelize a loop that calls a subroutine "sub_VFI" that is declared pure and recursive.
It doesn't work in parallel (it does in serial). Specifically, it seems to execute sub_VFI for each eind and mind but then can't exit the loop. Intel Inspector says there is a data race condition.
I have no idea what could be wrong with this code so if you could give me some clues as to what could be going wrong, that would be very helpful. I thought this would be perfectly fine as sub_VFI is supposed to be free from side effects and each thread given it's own memory.
Thanks so much for your time!
Grey
Edit: Maybe I should add that sub_VFI does use the value of global variables, but does not modify them. I don't think this is a problem, please correct me if I'm wrong.
I am trying to parallelize a loop that calls a subroutine "sub_VFI" that is declared pure and recursive.
[fortran] !$OMP PARALLEL DO DEFAULT(SHARED) PRIVATE(eind,mind)The intents are as follows: vR is out, everything else is in. Also all of the in arrays are automatic arrays.
do mind = 1,len_m
do eind = 1,len_e
call sub_VFI(vR(:,eind,mind),MPL(eind,mind),qapTmp(:,eind,mind),&
grid_a,grid_a,ap_eq0_ind, len_aTmp,len_apTmp,len_e,len_m)
end do
end do
!$OMP END PARALLEL DO[/fortran]
It doesn't work in parallel (it does in serial). Specifically, it seems to execute sub_VFI for each eind and mind but then can't exit the loop. Intel Inspector says there is a data race condition.
I have no idea what could be wrong with this code so if you could give me some clues as to what could be going wrong, that would be very helpful. I thought this would be perfectly fine as sub_VFI is supposed to be free from side effects and each thread given it's own memory.
Thanks so much for your time!
Grey
Edit: Maybe I should add that sub_VFI does use the value of global variables, but does not modify them. I don't think this is a problem, please correct me if I'm wrong.
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Right, it does look like you are correctly sending separate sections of the array to each thread, provided that no one over-runs the subscripts.
I think -heap-arrays is dangerous with OpenMP; I would think if it's a problem Inspector should have pointed out a specific point in the code, and you could see if removing -heap-arrays helps.
-heap-arrays 100 puts all arrays on the heap except for those which the compiler knows at compile time will never exceed size 100.
I think -heap-arrays is dangerous with OpenMP; I would think if it's a problem Inspector should have pointed out a specific point in the code, and you could see if removing -heap-arrays helps.
-heap-arrays 100 puts all arrays on the heap except for those which the compiler knows at compile time will never exceed size 100.
Link Copied
9 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What type is vR ?Is it a POINTER or an ALLOCATABLE ?
Then you may try todeclare it INOUT instead of OUT.
The INTENT attribute is related to the association status of a pointer, not to the data it points to. I had instances where the compiler would justreinitialize (= nullify)a pointer in a subroutine ifit isOUT. If it is INOUT then it won't touch it.
Michael
Then you may try todeclare it INOUT instead of OUT.
The INTENT attribute is related to the association status of a pointer, not to the data it points to. I had instances where the compiler would justreinitialize (= nullify)a pointer in a subroutine ifit isOUT. If it is INOUT then it won't touch it.
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply Michael.
vR is a an automatic array both in the calling and receiving program and it is neither allocatable nor a pointer.
vR is a an automatic array both in the calling and receiving program and it is neither allocatable nor a pointer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
An automatic array has to be local to the subroutine. You probably don't mean automatic, but we can't know without seeing the code.
Automatic arrays of a size sufficient to use OpenMP aren't reliable, at least in the sense that you get no message should you fail to get a full allocation. Thus, allocatable would be preferred. But you may not mean automatic.
Automatic arrays of a size sufficient to use OpenMP aren't reliable, at least in the sense that you get no message should you fail to get a full allocation. Thus, allocatable would be preferred. But you may not mean automatic.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply Tim.
sub_VFI looks like
I thought those were automatic arrays. len_a and len_ap are both 300. Do you mean by "get a full allocation" that I might be running out of memory? When you say allocatable would be preferred, do you mean addaing allocatable as an atribute to all the arrays and converting intent out variables to intent inout?
Thank you,
Grey
sub_VFI looks like
[fortran]pure recursive subroutine sub_VFI(vR,MPL,qap_ap,grid_a,grid_ap,&
ap_eq0_ind,len_a,len_ap,len_e,len_m)
use mod_common_var, only: discount
implicit none
integer, intent(in) :: len_a,len_ap,len_e,len_m
real(8), dimension(1:len_a), intent(out):: vR
real(8), intent(in) :: MPL
real(8), dimension(1:len_ap), intent(in) :: qap_ap
real(8), dimension(1:len_a), intent(in) :: grid_a
real(8), dimension(1:len_ap), intent(in) :: grid_ap
integer, intent(in) :: ap_eq0_ind
...
end subroutine sub_VFI[/fortran]
I thought those were automatic arrays. len_a and len_ap are both 300. Do you mean by "get a full allocation" that I might be running out of memory? When you say allocatable would be preferred, do you mean addaing allocatable as an atribute to all the arrays and converting intent out variables to intent inout?
Thank you,
Grey
pure recursive subroutine sub_VFI(vR,MPL,qap_ap,grid_a,grid_ap,ap_eq0_ind,len_a,len_ap,len_e,len_m)
use mod_common_var, only: discount
implicit none
integer, intent(in) :: len_a,len_ap,len_e,len_m
real(8), dimension(1:len_a), intent(out):: vR
real(8), intent(in) :: MPL
real(8), dimension(1:len_ap), intent(in) :: qap_ap
real(8), dimension(1:len_a), intent(in) :: grid_a
real(8), dimension(1:len_ap), intent(in) :: grid_ap
integer, intent(in) :: ap_eq0_ind
...
end subroutine subVFI
use mod_common_var, only: discount
implicit none
integer, intent(in) :: len_a,len_ap,len_e,len_m
real(8), dimension(1:len_a), intent(out):: vR
real(8), intent(in) :: MPL
real(8), dimension(1:len_ap), intent(in) :: qap_ap
real(8), dimension(1:len_a), intent(in) :: grid_a
real(8), dimension(1:len_ap), intent(in) :: grid_ap
integer, intent(in) :: ap_eq0_ind
...
end subroutine subVFI
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, these aren't automatic arrays, at least not inside this subroutine. They are provided by the calling subroutine (argument association). They could conceivably be automatic in the caller, but we can't see.
If any of the arrays happen to overlap (including the possibility of running outside the dimension of the original declaration), that could create a race which the compiler can't expect when parallelizing.
A possibility is that multiple threads are using the same OUT array. Ideally, inspector points out the problem, at least to the extent of identifying an offending array.
If any of the arrays happen to overlap (including the possibility of running outside the dimension of the original declaration), that could create a race which the compiler can't expect when parallelizing.
A possibility is that multiple threads are using the same OUT array. Ideally, inspector points out the problem, at least to the extent of identifying an offending array.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks again for your response.
When you say there is a possibility "that multiple threads are using the same OUT array," I don't understand how that's possible. When I call sub_VFI with sub_VFI(vR(:,eind,mind),...), isn't it necessarily the case that each thread operates on distinct subsections of vR? Is it a problem if the threads try to write to different sections of vR at the same time? I didn't think it was, but I don't know omp very well. Or is your point just that if I have gone out of bounds, then I could be accessing the same portion of memory from different threads.
Separate question: Is it a problem if two threads read from the same data at the same time? As in both reading from grid_a at the same time. From what I've read I didn't think so.
Thank you,
Grey
PS I'm on version 12.0.3 and I'm using the following flags
-O3 -xHost -openmp -heap-arrays 100 -reentrancy threaded -mkl=sequential -fp-model source -gen-interfaces -g -debug minimal -traceback -check pointers -warn all -warn nounused
And I'm on an AMD processor 64 bit.
When you say there is a possibility "that multiple threads are using the same OUT array," I don't understand how that's possible. When I call sub_VFI with sub_VFI(vR(:,eind,mind),...), isn't it necessarily the case that each thread operates on distinct subsections of vR? Is it a problem if the threads try to write to different sections of vR at the same time? I didn't think it was, but I don't know omp very well. Or is your point just that if I have gone out of bounds, then I could be accessing the same portion of memory from different threads.
Separate question: Is it a problem if two threads read from the same data at the same time? As in both reading from grid_a at the same time. From what I've read I didn't think so.
Thank you,
Grey
PS I'm on version 12.0.3 and I'm using the following flags
-O3 -xHost -openmp -heap-arrays 100 -reentrancy threaded -mkl=sequential -fp-model source -gen-interfaces -g -debug minimal -traceback -check pointers -warn all -warn nounused
And I'm on an AMD processor 64 bit.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Right, it does look like you are correctly sending separate sections of the array to each thread, provided that no one over-runs the subscripts.
I think -heap-arrays is dangerous with OpenMP; I would think if it's a problem Inspector should have pointed out a specific point in the code, and you could see if removing -heap-arrays helps.
-heap-arrays 100 puts all arrays on the heap except for those which the compiler knows at compile time will never exceed size 100.
I think -heap-arrays is dangerous with OpenMP; I would think if it's a problem Inspector should have pointed out a specific point in the code, and you could see if removing -heap-arrays helps.
-heap-arrays 100 puts all arrays on the heap except for those which the compiler knows at compile time will never exceed size 100.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Removing the heap arrays option did the trick. Thanks for all your help, I really appreciate it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim,
Could you explain what is dangerous about OpenMP and -heap-arrays?
Other than the overhead ofmaking a heap allocation/deallocation
The fact that removal of -heap-arrays 100 "fixed" the symptoms does not necessarily mean it fixedthe problem. The placement in memory (and/or overhead in thread-safe code) should not affect the results. There are two alternate potential causes:
One you addressed indexing out of bounds on the intent(out) array.
Two, -heap-arrays likely will change the state of uninitialized variables more so than stack arrays.
Make that three: an alignment issue exposing a compiler optimization bug
I would recommend that Richard investigate this further.
Jim Dempsey
Could you explain what is dangerous about OpenMP and -heap-arrays?
Other than the overhead ofmaking a heap allocation/deallocation
The fact that removal of -heap-arrays 100 "fixed" the symptoms does not necessarily mean it fixedthe problem. The placement in memory (and/or overhead in thread-safe code) should not affect the results. There are two alternate potential causes:
One you addressed indexing out of bounds on the intent(out) array.
Two, -heap-arrays likely will change the state of uninitialized variables more so than stack arrays.
Make that three: an alignment issue exposing a compiler optimization bug
I would recommend that Richard investigate this further.
Jim Dempsey

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page