Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

openmp problem in nested structure with -heap-arrays

may_ka
Beginner
493 Views

Hi there,

it took me a while to compile an example which produces the same problem as in my library, but finally I got one. Complied with "ifort -qopenmp -heap-arrays -O3" the code below produces either a segfault (if pt in subroutine subset is nullified at the end), or nonsens values in tmp (if pt is not nullified in subroutine subset).

Module mod1
  Type :: type1
    real :: val
    real, Allocatable :: tmp(:,:)
    character(:), allocatable :: what
  contains
    procedure, pass :: set => subset
    procedure, pass :: sel => subselect
  End type type1
  Type :: type1pt
    class(type1), Pointer :: pt
  End type type1pt
  private :: subset, subselect
contains
  subroutine subselect(this)
    class(type1), Intent(InOUt) :: this
    Select case(this%what)
    Case("a")
      write(*,*) this%what
      call this%Set()
    End Select
  end subroutine subselect
  subroutine subset(this)
    class(type1), Intent(InOUt), target :: this
    real, Pointer :: pt(:,:)=>null()
    integer :: i
    write(*,*) "I am here"
    allocate(this%tmp(30000,15))
    nullify(pt)
    pt=>this%tmp
    pt=0.0
    Do i=1,10000
      pt=this%val
    End Do
    !nullify(pt)
  End subroutine subset
End Module mod1
Module ModMom
  use Mod1
  Type :: mother
    Type(type1pt), Allocatable :: tvmyfolks(:)
  contains
    Procedure, Pass :: SetFolks=>SubSetFolks
  End type mother
contains
  Subroutine SubSetFolks(this)
    class(mother), intent(inout) :: this
    integer :: i,j
    j=size(this%TVMyfolks)
    !$OMP PARALLEL DO
    Do i=1,j
      call this%TVMyfolks(i)%pt%Sel()
    End Do
    !$OMP END PARALLEL DO
  End Subroutine SubSetFolks
End Module ModMom
Program Test
  use mod1
  use modmom
  Type(type1), Target :: a,b,c,d
  Type(mother) :: mom
  integer :: i
  allocate(mom%TVMyfolks(4))
  mom%TVMyfolks(1)%pt=>a
  mom%TVMyfolks(2)%pt=>b
  mom%TVMyfolks(3)%pt=>c
  mom%TVMyfolks(4)%pt=>d
  Do i=1,4
    mom%TVMyfolks(i)%pt%val=i
    mom%TVMyFolks(i)%pt%what="a"
  End Do
  call mom%SetFolks()
  Do i=1,4
    write(*,*) mom%TVMyfolks(i)%pt%tmp(1,:)
  end Do
End Program Test

The problem vanishes if -heap-array is omitted from the compiler flags. I don't know whether my code is wrong or whether this is a compiler bug. However, I don't have problems when using gfortran (I think Steve once mentioned in one of my openmp threads that gf has "-heap-arrays" on by default.

While the pointer in "subset" is not really necessary, the whole behaviour concerns me. In fact it first appeared when pt in in subset was not decleared at all, but used in an associate statement.

Any Idea

Thanks a lot

Karl

PS: ifort version 16.0.3, linux kernel 4.6.3

 

0 Kudos
6 Replies
jimdempseyatthecove
Honored Contributor III
493 Views

Maybe this is an optimization problem when eliminating unnecessary code. What happens when you comment out lines 31,32 and 34 (essentially do what the optimizer should do). If this "corrects" the problem, then this is a good reproducer for Intel to use in diagnosis of this problem.

Jim Dempsey

0 Kudos
may_ka
Beginner
493 Views

Hi,

just a follow up on this. The problem vanishs when not using the pointer, and also does not show up when using associate(x=>this%tmp). However, I am wondering whether pointer construct isn't supposed to work as well.

 

cheers

0 Kudos
Martyn_C_Intel
Employee
493 Views

I was able to reproduce the behavior. This looks like a race condition. I saw it with the version 16 compiler, but not with earlier ones.

By default, the compiler allocates temporary variables on the stack, which automatically makes them thread safe in threaded regions where each thread has its own stack. -heap-arrays overrides this and causes temporaries to be allocated on the heap, which may require complex synchronization to make it work for parallel regions.

However, I believe your source code contains a race condition on the variable pt in subroutine subset, due to the declaration

      real, Pointer :: pt(:,:)  =>null()

This initialization forces the pointer to be stored statically, (as if it had a SAVE attribute), instead of on the stack. (Otherwise, it wouldn't be available for initialization at program startup). This results in each thread using the same static pointer, which leads to the race condition and overwrites. You should be able to confirm this using Intel Inspector. The fix is simply to remove the initialization from the declaration statement:

      real, Pointer :: pt(:,:)

It is not needed, since you have    nullify(pt)    before pt is ever used. A separate copy of Pt would now get allocated on the stack for each thread at run-time; -heap-arrays converts these to allocations on the heap at run-time that remain thread safe, and the program runs correctly.

I think there is a potential race condition, whether or not you use -heap-arrays, and with whichever compiler, but it may not get exposed, depending on the actual memory layout. I was able to reproduce the problem even without the -heap-arrays switch.

Nevertheless, I do not recommend the use of -heap-arrays with threaded code in general. Apart from the risk of thread safety issues, there is a performance impact due to the additional synchronization that increases with the number of threads. Yes, without -heap-arrays, you may need to increase the shell stack limit, e.g. with ulimit -s unlimited (I do this automatically for any OpenMP program). You may sometimes need to increase the thread stack size above its default value of a few MB using the environment variable OMP_STACKSIZE. But it's a safer, more efficient way of working.

0 Kudos
may_ka
Beginner
493 Views

Hi Martyn,

thank you for this very elaborate answer. The original recommendation of using -heap-arrays came somewhat out of this thread:

https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/610751

While the above code does not implement making a private copy of each array for each thread (as in the above link), the tmp arrays within each child may be large (e.g. 10,000,000 x 100), and each child is assigned to a single thread. Is there any chance for runnig again into segfaults due to running out of stack??

Thanks a lot

Karl

0 Kudos
Steven_L_Intel1
Employee
493 Views

-heap-arrays shouldn't be a problem - the pointer is kept on the stack and the allocation is done thread-safe as all other allocations are. At least that's the way it's supposed to work. I would go ahead and use heap-arrays unless you find it creates problems.

0 Kudos
Martyn_C_Intel
Employee
493 Views

Yes, you will run out of stack unless you set OMP_STACKSIZE appropriately, as well as ulimit -s unlimited.

If you do a lot of allocations to the heap from inside a parallel region, you will pay a price in performance due to the synchronization. But if only a few, that won't be important and -heap-arrays may be easier.

0 Kudos
Reply