Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29390 Discussions

OMP array slice assignment code generation bug

IanH
Honored Contributor III
1,043 Views
Well, perhaps? I can't see anything wrong code wise...

When the following little program is compiled with 11.1.054 using /debug:full /debug:parallel /Od /Qopenmp /stand:f03 /warn:all /traceback /check:all /libs:static /threads /dbglibs and then run, it dies with an access violation aparently during the rather inocuous assignment to slice. Command line switches in bold appear to be required for the access violation to occur. Commenting out the line marked with xxx (which is after the reported crash position) stops the problem. The crash happens even if "serialised parallel regions" is active at the appropriate time in the IDE. No problems in debug mode without OMP. I went digging through the assembly and got very confused, possibly because the code that does the assignment is also confused (or possibly because of a common very good reason for why I don't code in assembly).

[plain]MODULE TestMod
  IMPLICIT NONE
  INTEGER, PARAMETER :: n2 = 10
CONTAINS
  SUBROUTINE Sub(n, arr)
    INTEGER, INTENT(IN) :: n
    REAL, INTENT(IN) :: arr(n,n2)
    INTEGER :: i
    REAL :: slice(n2)
    !****
    WRITE (*,"('Parallel do')")
    !$OMP PARALLEL DO DEFAULT(NONE), PRIVATE(slice), SHARED(arr,n)
    DO i = 1, n
      ! Collect elements out of natural order into a contiguous array
      slice = arr(i,:) ! Things appear to go boom here
      ! Now go off and do things with this slice...
      ! Later, try and do things with a different slice
      CALL Proc(arr(n,:))    ! xxx
    END DO
    WRITE (*,"('Parallel don''t')")
  END SUBROUTINE Sub
  SUBROUTINE Proc(arg)
    REAL, INTENT(IN) :: arg(:)
    !****
  END SUBROUTINE Proc    
END MODULE TestMod

PROGRAM OmpGen
  USE TestMod
  IMPLICIT NONE
  REAL, ALLOCATABLE :: arr(:,:)
  INTEGER :: n
  !----  
  n = 5
  ALLOCATE(arr(n,n2))
  arr = 1.0
  CALL Sub(n, arr)
  WRITE (*,"('All done!')")
END PROGRAM OmpGen
[/plain]
0 Kudos
7 Replies
jimdempseyatthecove
Honored Contributor III
1,043 Views

First try adding RECURSIVE to the subroutines called from within a parallel region.

RECURSIVE SUBROUTINE Sub(n, arr)

I cannot see all the variable declarations in SUBROUTINE Proc. It may need RECURSIVE as well (so will any nested subroutine calls).

What is likely happening is your local array slice(n2)had implicit SAVE attribute.

If that does not work, there was a reported bug where temporary arrays were created with SAVE attribute (as if with SAVE) and one instance of the array descriptor were created. IOW multiple threads could happen to use the same descriptor. If the RECURSIVE fixes the problem then this is a programming problem. (IMHO FORTRAN needs an attribute to identify a variable/array as stack local, but these traditions are long standing and you have to live with them).

If RECURSIVE does not work then check your compiler version to see if it is current. This problem (local temp array as SAVE when in RECURSIVE subroutine/function) was reported and is likely fixed by now.

Jim Dempsey
0 Kudos
IanH
Honored Contributor III
1,043 Views

First try adding RECURSIVE to the subroutines called from within a parallel region.

RECURSIVE SUBROUTINE Sub(n, arr)

I cannot see all the variable declarations in SUBROUTINE Proc. It may need RECURSIVE as well (so will any nested subroutine calls).

...

If RECURSIVE does not work then check your compiler version to see if it is current. This problem (local temp array as SAVE when in RECURSIVE subroutine/function) was reported and is likely fixed by now.


Thanks for the comments. I tried RECURSIVE but I'm still seeing the same problem.

Subroutine Proc was as you see it (nothing has been elided from the posted program's source) - it has no declarations apart from the dummy argument and it has no executable statements.

I'm using 11.1.54, which I think is the latest (update 4).

Curiously, when I was testing on a different (faster) machine this morning the problem became intermittent, with likelihood increasing the larger I made the second dimension (n2 - the slice dimension). I guess this implies that I have some sort of timing issue going on (though how that manifests when "parallel regions are serialised" (or whatever the term is under the parallel debugger) is beyond me).

What I'm not so sure about is whether I had correct code for OpenMP'ing. If you can't see anything immediately wrong with the essentially two executable Fortran statements that are in the parallel region then I might put on my wellingtons and go wading back into assembly land.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,043 Views

[cpp]In Sub you have

05.  SUBROUTINE Sub(n, arr)   
06.    INTEGER, INTENT(IN) :: n   
07.    REAL, INTENT(IN) :: arr(n,n2)   
...
18.      CALL Proc(arr(n,:))    ! xxx   


In Proc you have

22.  SUBROUTINE Proc(arg)   
23.    REAL, INTENT(IN) :: arg(:)   

The call to Proc will require the creation of a temporary array.
Potentially this could get by with creating an array descriptor
with a stride of n without moving data. Or this can make a
temporary copy with stride 1.

Identical slice will be made *** by each thread ***

This is not an error in this example, rather it is an inefficency.
This may be an error if the called routine (Proc) used arg as
INTENT(INOUT) because then you would have aliases to the same
memory.

In any event the code should not go boom.

What happens when

12.    !$OMP PARALLEL DO DEFAULT(NONE), PRIVATE(slice), SHARED(arr,n)   
13.    DO i = 1, n   
14.      ! Collect elements out of natural order into a contiguous array 
!$OMP CRITICAL  
15.      slice = arr(i,:) ! Things appear to go boom here   
16.      ! Now go off and do things with this slice...   
17.      ! Later, try and do things with a different slice   
18.      CALL Proc(arr(n,:))    ! xxx   
!$OMP END CRITICAL  
19.    END DO   

Not that you want critical sections, rather the above is simply to
see if you no longer go boom.
Also, in Proc, write the location of the 1st and 2nd elements of the array
I want to see if you have a stride of 1 (indicating temporary array)
Or stride of n (indicating array descriptor pointing to strided slice of arr)
And I want to see if each thread references same locations or different locations.


Jim Dempsey [/cpp]
0 Kudos
IanH
Honored Contributor III
1,043 Views
Thanks for the suggestions. Appreciate the potential inefficiencies with the array element order issue. I hadn't thought of the possibility of a temporary, given the assumed shape.

Things still go boom with the critical sections. Adding the suggested write (..) LOC(xx) statements produced some interesting results. In non-OpenMP runs the LOC results for the relevant array elements agreed between sub and proc - indicating that there's a descriptor being passed in that specific case. Also, the differences between LOC(arr(n,1)) and LOC(arr(n,2)) were the expected number of bytes given the stride (5 elements by 8 bytes = 20). But in OpenMP builds, LOC(arr(n,2)) was typically 1.2 MB apart from LOC(arr(n,1)). That's a rather big stride, for an array that is only 5 by 10 REAL(8) elements in size. Occasionally a thread makes it into Proc, in this case all LOC's for that thread have exactly the same value (indicates a descriptor being passed, with zero stride??).

I don't think all is well. Wellington's on, glass of red charged, rest of bottle handy, I'm going back to assembly land...
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,043 Views

InaH,

Good detective work (even without success).

The critical section eliminated a race condition. Clearly the array descriptor is getting trashed or built wrong.

This could be a case where the caller and callee are not agreeing on how to pass the argument (descriptor or address of first cell).

Although this next suggestion should not be required, try supplying a interface declaration for your forward reference. (I am the type that will look into the horse's mouth to count the teeth as opposed to a philosophical discussion as to how many ought to be there.)

You may need a case of that red wine.

BTW - I assume you have submitted this test case to premier support.

Jim Dempsey
0 Kudos
Steven_L_Intel1
Employee
1,043 Views
I've reported this as issue DPD200148861. Right now, I'm guessing that the parallel debug extension calls added to the code are messing something up....
0 Kudos
Steven_L_Intel1
Employee
1,043 Views
This ended up being a code generation bug. I expect it to be fixed in 11.1 Update 7, scheduled for late August.
0 Kudos
Reply