workaround for using derived type with allocatable components within OpenMP regions

Crni_G_ · ‎12-06-2015

I'm working on a CFD code, operating on several large 3D arrays. Coming from C background, I was thinking to encapsulate these arrays into a derived type:

type variables
  integer :: nx, ny, nz ! Grid size.
  real, allocatable, dimension(:, :, :) :: p_1 ! Pressure at t = t.
  real, allocatable, dimension(:, :, :) :: p_2 ! Pressure at t = t + dt.
  ! etc.
end type

At the end of each simulation time step, I'm updating arrays to prepare for the next time step:

type(variables) :: vars
! ...
!$OMP PARALLEL WORKSHARE
  vars%p_1 = vars%p_2
  ! etc.
!$OMP END PARALLEL WORKSHARE

That was working fine with GNU Fortran (version 4.4.7), however with Intel Fortran (version 13.0.1 20121010) the code will crash within OpenMP region. Looking further into the issue, I learned that using derived types with allocatable components is unsupported with OpenMP, and am looking now into how to overcome the issue. I would still prefer to keep arrays encapsulated into this derived type. I've noticed that the crash is avoided if I create a subroutine for OpenMP region above:

subroutine update(nx, ny, nz, p_1, p_2)
  integer, intent(in) :: nx, ny, nz
  real, dimension(nx, ny, nz), intent(out) :: p_1
  real, dimension(nx, ny, nz), intent(in) :: p_2

!$OMP PARALLEL WORKSHARE
  p_1 = p_2
!$OMP END PARALLEL WORKSHARE
end subroutine

and if I just pass arrays from derived type into this subroutine:

  call update(nx, ny, nz, vars%p_1, vars%p_2)

It's somewhat cumbersome approach, still I'm OK with it. But, my question is: is this code now valid, is there any undefined behavior left here?

Thanks.

jimdempseyatthecove · ‎12-07-2015

You might fine the following to be a bit more efficient:

subroutine update(nxyz, p_1, p_2)
  integer, intent(in) :: nxyz
  real, dimension(*), intent(out) :: p_1
  real, dimension(*), intent(in) :: p_2

!$OMP PARALLEL WORKSHARE
  p_1(1:nxyz) = p_2(1:nxyz)
!$OMP END PARALLEL WORKSHARE
end subroutine

...

call update(nx*ny*nz, vars%p_1, vars%p_2)

Jim Dempsey

Crni_G_ · ‎12-07-2015

Thanks for the reply, and for performance hint, but - do you think the approach that I tried to explain in my first message is appropriate for avoiding issues between OpenMP and allocatable components of derived types?

Thanks.

jimdempseyatthecove · ‎12-07-2015

IMHO, your original post code should have worked as vars is default shared. I suspect this was a bug.

My suggested code often produces more efficient code, due to the optimizer not necessarily performing the equivalent of unroll(3). V16 update 1 does still have code efficiency issues of well vectorized ArrayOut = ArrayIn where the two arrays are multi-dimensional. I've seen cases on Xeon Phi (V16 update 1)

real(8) :: Array(6,6) ! subroutine local array
...
Array = 0.0_8

producing quite inefficient code. For an allocatable array (known to be contiguous), passing a contiguous multi-dimension slice (or whole), to a dimension(*) array as in #2 can experience significant performance gain for the given operation. If performance is of lessor concern than program elegance then continue to use vars%p_1 = vars%p_2 without the subroutine... and wait for the compiler optimization to catch up.

Jim Dempsey

Crni_G_ · ‎12-07-2015

Great, thanks again for your replies!