Solved: Re: Why oh why does my OMP program die?

IanH · ‎07-23-2009

I've been experimenting with OMP lately, however recent efforts have hit a bit of a brick wall and I would appreciate any advice.

The following reduced example crashes with an access violation when compiled with 11.1.38 with /check:all /warn:all /Qopenmp. It strikes me as strange, because the program doesn't actually do very much. What have I missed?

(It works "as expected" using 10.1 and 11.0, or with no /check:all, or if I compile while facing north, or...)

Thanks,

IanH

[cpp]MODULE AMod
  IMPLICIT NONE
  INTEGER, PARAMETER :: na = 10
CONTAINS 
  SUBROUTINE ASub(n1, n2, n3, n4, array_in)
    INTEGER, INTENT(IN) :: n1, n2, n3, n4
    REAL, INTENT(IN) :: array_in(na,n4,n3,n2,n1)
    ! Structure indices          
    INTEGER i1,i2,i3,i4    
    REAL slice(na)
    !***************************************************************************
    write (*,*) 'start'
    !$OMP PARALLEL DO NUM_THREADS(1), DEFAULT(NONE),  &
    !$OMP     PRIVATE(i2,i3,i4,slice),  &
    !$OMP     SHARED(n1, n2, n3, n4, array_in)
    Loop1: DO i1 = 1, n1
      Loop2: DO i2 = 1, n2
        Loop3: DO i3 = 1, n3
          Loop4: DO i4 = 1, n4
            slice = array_in(:,i4,i3,i2,i1)
            CALL BSub(slice, array_in(:,1,1,1,i1))
          END DO Loop4
        END DO Loop3
      END DO Loop2
    END DO Loop1
    WRITE (*,*) 'finish'
  END SUBROUTINE ASub
  
  SUBROUTINE BSub(array1, array2)
    ! Arguments
    REAL, INTENT(IN) :: array1(:)   
    REAL, INTENT(IN) :: array2(:) 
    !***************************************************************************
  END SUBROUTINE BSub
END MODULE AMod

PROGRAM omp
  USE AMod
  IMPLICIT NONE
  INTEGER n1, n2, n3, n4
  REAL, ALLOCATABLE :: in(:,:,:,:,:)
  n1 = 1; n2 = 2; n3 = 3; n4 = 4
  ALLOCATE(in(na,n4,n3,n2,n1))
  in = 0.0
  CALL ASub(n1, n2, n3, n4, in)
END PROGRAM omp[/cpp]

jimdempseyatthecove · ‎07-24-2009

Try this
...
CALL BSub(na, slice, array_in(:,1,1,1,i1))
...
SUBROUTINE BSub(n,array1, array2)
! Arguments
INTEGER :: n
REAL, INTENT(IN) :: array1(n)
REAL, INTENT(IN) :: array2(n)

or try this

CALL BSub(slice, array_in(:,1,1,1,i1)) ! as in original sample code
...
SUBROUTINE BSub(array1, array2)
! Arguments
REAL, INTENT(IN) :: array1(:)
REAL, INTENT(IN) :: array2(:,:,:,:,:) ! and fixup references in your code

My assumption is the original code was creating a rank 1 array descriptor

a) as static (SAVE) as opposed to on stack in scope of parallel region
b) as shared stack array descriptor in scope _prior to_ scope of parallel region. (shared)

The first suggestion may run faster.

Jim Dempsey

View solution in original post

bmchenry · ‎07-23-2009

quick looks reveals Allocate(in(na,n4,n3,n2,n1)
item 'Na' is undefined

oops! on 2nd check i see it's defined in in the module

jimdempseyatthecove · ‎07-24-2009

Try this
...
CALL BSub(na, slice, array_in(:,1,1,1,i1))
...
SUBROUTINE BSub(n,array1, array2)
! Arguments
INTEGER :: n
REAL, INTENT(IN) :: array1(n)
REAL, INTENT(IN) :: array2(n)

or try this

CALL BSub(slice, array_in(:,1,1,1,i1)) ! as in original sample code
...
SUBROUTINE BSub(array1, array2)
! Arguments
REAL, INTENT(IN) :: array1(:)
REAL, INTENT(IN) :: array2(:,:,:,:,:) ! and fixup references in your code

My assumption is the original code was creating a rank 1 array descriptor

a) as static (SAVE) as opposed to on stack in scope of parallel region
b) as shared stack array descriptor in scope _prior to_ scope of parallel region. (shared)

The first suggestion may run faster.

Jim Dempsey

IanH · ‎07-24-2009

Thanks muchly - both those solutions work well. The descriptor explanation also nicely explains some other problems that I was having.

What is your understanding - should passing descriptors like the original code work? Or to be correct code for OpenMP must I always use one of your two alternatives you posted when I'm passing arrays around?

IanH

jimdempseyatthecove · ‎07-24-2009

IanH,

I am glad both methods worked (didn't try them myself :~).

Nothing wrong in passing descriptors as long as the discriptor that gets passed is the one you intended on passing.

In your original example you were passing a rank 5 array to a function that required a descriptor for a rank 1 array. The compiler had to generate the rank 1 array descriptor. It chose to allocate room for and placement of the descriptor in the stack scope (or static)outside that of the parallel region. IMHO this is a bug because you had DEFAULT(NONE), the compiler should have errored out. Had you had DEFAULT(SHARED) it would raise an interesting paradox (for you) as the new rank 1 array descriptor would implicitly be shared, however, it is being auto-generated using indicies that are PRIVATE. The intentions are not so clear when viewed in this respect. I would suspect that due to this being a problem, that there also may be a problem with TRANSFER assuming it is legal to have SomeFunc(TRANSFER (source,mold[,size])) - i.e. search your solution for TRANSFER inside parallel region.

In the two solutions I proposed, the creation of the temporary rank 1 array descriptor is eliminated.

Passing the extent in, for rank 1 arrays, will (in this example) generate faster code. But this looks F77-ish (regardless of being more efficient).

Jim Dempsey