- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, perhaps? I can't see anything wrong code wise...
When the following little program is compiled with 11.1.054 using /debug:full /debug:parallel /Od /Qopenmp /stand:f03 /warn:all /traceback /check:all /libs:static /threads /dbglibs and then run, it dies with an access violation aparently during the rather inocuous assignment to slice. Command line switches in bold appear to be required for the access violation to occur. Commenting out the line marked with xxx (which is after the reported crash position) stops the problem. The crash happens even if "serialised parallel regions" is active at the appropriate time in the IDE. No problems in debug mode without OMP. I went digging through the assembly and got very confused, possibly because the code that does the assignment is also confused (or possibly because of a common very good reason for why I don't code in assembly).
When the following little program is compiled with 11.1.054 using /debug:full /debug:parallel /Od /Qopenmp /stand:f03 /warn:all /traceback /check:all /libs:static /threads /dbglibs and then run, it dies with an access violation aparently during the rather inocuous assignment to slice. Command line switches in bold appear to be required for the access violation to occur. Commenting out the line marked with xxx (which is after the reported crash position) stops the problem. The crash happens even if "serialised parallel regions" is active at the appropriate time in the IDE. No problems in debug mode without OMP. I went digging through the assembly and got very confused, possibly because the code that does the assignment is also confused (or possibly because of a common very good reason for why I don't code in assembly).
[plain]MODULE TestMod
IMPLICIT NONE
INTEGER, PARAMETER :: n2 = 10
CONTAINS
SUBROUTINE Sub(n, arr)
INTEGER, INTENT(IN) :: n
REAL, INTENT(IN) :: arr(n,n2)
INTEGER :: i
REAL :: slice(n2)
!****
WRITE (*,"('Parallel do')")
!$OMP PARALLEL DO DEFAULT(NONE), PRIVATE(slice), SHARED(arr,n)
DO i = 1, n
! Collect elements out of natural order into a contiguous array
slice = arr(i,:) ! Things appear to go boom here
! Now go off and do things with this slice...
! Later, try and do things with a different slice
CALL Proc(arr(n,:)) ! xxx
END DO
WRITE (*,"('Parallel don''t')")
END SUBROUTINE Sub
SUBROUTINE Proc(arg)
REAL, INTENT(IN) :: arg(:)
!****
END SUBROUTINE Proc
END MODULE TestMod
PROGRAM OmpGen
USE TestMod
IMPLICIT NONE
REAL, ALLOCATABLE :: arr(:,:)
INTEGER :: n
!----
n = 5
ALLOCATE(arr(n,n2))
arr = 1.0
CALL Sub(n, arr)
WRITE (*,"('All done!')")
END PROGRAM OmpGen
[/plain]
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First try adding RECURSIVE to the subroutines called from within a parallel region.
RECURSIVE SUBROUTINE Sub(n, arr)
I cannot see all the variable declarations in SUBROUTINE Proc. It may need RECURSIVE as well (so will any nested subroutine calls).
What is likely happening is your local array slice(n2)had implicit SAVE attribute.
If that does not work, there was a reported bug where temporary arrays were created with SAVE attribute (as if with SAVE) and one instance of the array descriptor were created. IOW multiple threads could happen to use the same descriptor. If the RECURSIVE fixes the problem then this is a programming problem. (IMHO FORTRAN needs an attribute to identify a variable/array as stack local, but these traditions are long standing and you have to live with them).
If RECURSIVE does not work then check your compiler version to see if it is current. This problem (local temp array as SAVE when in RECURSIVE subroutine/function) was reported and is likely fixed by now.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
First try adding RECURSIVE to the subroutines called from within a parallel region.
RECURSIVE SUBROUTINE Sub(n, arr)
I cannot see all the variable declarations in SUBROUTINE Proc. It may need RECURSIVE as well (so will any nested subroutine calls).
...
If RECURSIVE does not work then check your compiler version to see if it is current. This problem (local temp array as SAVE when in RECURSIVE subroutine/function) was reported and is likely fixed by now.
Thanks for the comments. I tried RECURSIVE but I'm still seeing the same problem.
Subroutine Proc was as you see it (nothing has been elided from the posted program's source) - it has no declarations apart from the dummy argument and it has no executable statements.
I'm using 11.1.54, which I think is the latest (update 4).
Curiously, when I was testing on a different (faster) machine this morning the problem became intermittent, with likelihood increasing the larger I made the second dimension (n2 - the slice dimension). I guess this implies that I have some sort of timing issue going on (though how that manifests when "parallel regions are serialised" (or whatever the term is under the parallel debugger) is beyond me).
What I'm not so sure about is whether I had correct code for OpenMP'ing. If you can't see anything immediately wrong with the essentially two executable Fortran statements that are in the parallel region then I might put on my wellingtons and go wading back into assembly land.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp]In Sub you have 05. SUBROUTINE Sub(n, arr) 06. INTEGER, INTENT(IN) :: n 07. REAL, INTENT(IN) :: arr(n,n2) ... 18. CALL Proc(arr(n,:)) ! xxx In Proc you have 22. SUBROUTINE Proc(arg) 23. REAL, INTENT(IN) :: arg(:) The call to Proc will require the creation of a temporary array. Potentially this could get by with creating an array descriptor with a stride of n without moving data. Or this can make a temporary copy with stride 1. Identical slice will be made *** by each thread *** This is not an error in this example, rather it is an inefficency. This may be an error if the called routine (Proc) used arg as INTENT(INOUT) because then you would have aliases to the same memory. In any event the code should not go boom. What happens when 12. !$OMP PARALLEL DO DEFAULT(NONE), PRIVATE(slice), SHARED(arr,n) 13. DO i = 1, n 14. ! Collect elements out of natural order into a contiguous array !$OMP CRITICAL 15. slice = arr(i,:) ! Things appear to go boom here 16. ! Now go off and do things with this slice... 17. ! Later, try and do things with a different slice 18. CALL Proc(arr(n,:)) ! xxx !$OMP END CRITICAL 19. END DO Not that you want critical sections, rather the above is simply to see if you no longer go boom.
Also, in Proc, write the location of the 1st and 2nd elements of the array
I want to see if you have a stride of 1 (indicating temporary array)
Or stride of n (indicating array descriptor pointing to strided slice of arr)
And I want to see if each thread references same locations or different locations.
Jim Dempsey [/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the suggestions. Appreciate the potential inefficiencies with the array element order issue. I hadn't thought of the possibility of a temporary, given the assumed shape.
Things still go boom with the critical sections. Adding the suggested write (..) LOC(xx) statements produced some interesting results. In non-OpenMP runs the LOC results for the relevant array elements agreed between sub and proc - indicating that there's a descriptor being passed in that specific case. Also, the differences between LOC(arr(n,1)) and LOC(arr(n,2)) were the expected number of bytes given the stride (5 elements by 8 bytes = 20). But in OpenMP builds, LOC(arr(n,2)) was typically 1.2 MB apart from LOC(arr(n,1)). That's a rather big stride, for an array that is only 5 by 10 REAL(8) elements in size. Occasionally a thread makes it into Proc, in this case all LOC's for that thread have exactly the same value (indicates a descriptor being passed, with zero stride??).
I don't think all is well. Wellington's on, glass of red charged, rest of bottle handy, I'm going back to assembly land...
Things still go boom with the critical sections. Adding the suggested write (..) LOC(xx) statements produced some interesting results. In non-OpenMP runs the LOC results for the relevant array elements agreed between sub and proc - indicating that there's a descriptor being passed in that specific case. Also, the differences between LOC(arr(n,1)) and LOC(arr(n,2)) were the expected number of bytes given the stride (5 elements by 8 bytes = 20). But in OpenMP builds, LOC(arr(n,2)) was typically 1.2 MB apart from LOC(arr(n,1)). That's a rather big stride, for an array that is only 5 by 10 REAL(8) elements in size. Occasionally a thread makes it into Proc, in this case all LOC's for that thread have exactly the same value (indicates a descriptor being passed, with zero stride??).
I don't think all is well. Wellington's on, glass of red charged, rest of bottle handy, I'm going back to assembly land...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
InaH,
Good detective work (even without success).
The critical section eliminated a race condition. Clearly the array descriptor is getting trashed or built wrong.
This could be a case where the caller and callee are not agreeing on how to pass the argument (descriptor or address of first cell).
Although this next suggestion should not be required, try supplying a interface declaration for your forward reference. (I am the type that will look into the horse's mouth to count the teeth as opposed to a philosophical discussion as to how many ought to be there.)
You may need a case of that red wine.
BTW - I assume you have submitted this test case to premier support.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've reported this as issue DPD200148861. Right now, I'm guessing that the parallel debug extension calls added to the code are messing something up....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This ended up being a code generation bug. I expect it to be fixed in 11.1 Update 7, scheduled for late August.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page