Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

openmp seg fault!!???

may_ka
Beginner
580 Views

Hi community,

the code below fail with a seg fault when b is 2,000,000 instead of 200,000.

Module ModOne
  Type :: ClassOne
  contains
    Procedure, Pass :: One => SubOne
  End type ClassOne
  Private :: SubOne
contains
  Subroutine SubOne(this)
    Implicit None
    Class(ClassOne), Intent(InOut) :: this
    Integer(kind=8) :: c1,c2,c3,a,b
    Real(kind=8), Allocatable :: tmp(:)
    a=4
    b=2000000
    Allocate(tmp(b))
    !$OMP PARALLEL private(tmp) num_threads(2)
    !$OMP DO
    Do c1=1,a
      Do c2=1,a
        Do c3=1,b
          tmp(c3)=c3
        End Do
      End Do
    End Do
    !$OMP END DO
    !$OMP END PARALLEL
  end Subroutine SubOne
end Module ModOne
Program Test
  use ModOne
  Implicit None
  Type(ClassOne) :: a
  call a%One()
End Program Test

This does not happen with num_threads(1), and not when using gfortran.

I used the commerical ifort15.0 and the academic 16.0, both produce the same result.

Compiler commands where

ifort -heap-arrays -mkl -warn nounused -warn declarations -static -O3 -qopenmp

with mkl flags

MKL= -L$(MKLPATH) -I$(MKLINCLUDE) -lmkl_blas95_lp64 -lmkl_lapack95_lp64 -Wl,--start-group $(MKLPATH)/libmkl_intel_lp64.a $(MKLPATH)/libmkl_intel_thread.a $(MKLPATH)/libmkl_core.a -Wl,--end-group -liomp5 -lpthread

I cannot see anything wrong with the example.

Any help???

Thanks a lot and Cheers

Karl

0 Kudos
5 Replies
jimdempseyatthecove
Honored Contributor III
580 Views

Your code should have worked. I will let someone from Intel comment on this. In the mean time here is a work around:

Module ModOne
  Type :: ClassOne
  contains
    Procedure, Pass :: One => SubOne
  End type ClassOne
  Private :: SubOne
contains
  Subroutine SubOne(this)
    Implicit None
    Class(ClassOne), Intent(InOut) :: this
    Integer(kind=8) :: c1,c2,c3,a,b
    Real(kind=8), Allocatable :: tmp(:)
    a=4
    b=2000000
    !$OMP PARALLEL firstprivate(tmp) num_threads(2)
    Allocate(tmp(b))
    !$OMP DO
    Do c1=1,a
      Do c2=1,a
        Do c3=1,b
          tmp(c3)=c3
        End Do
      End Do
    End Do
    !$OMP END DO
    deallocate(tmp)
    !$OMP END PARALLEL
  end Subroutine SubOne
end Module ModOne
Program Test
  use ModOne
  Implicit None
  Type(ClassOne) :: a
  call a%One()
End Program Test

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
580 Views

I should mention though you may find if you fully optimize the above code that the parallel region (or at least the DO portion) gets elided (removed). Your actual code is likely using what you use as tmp in your sample code.

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
580 Views

It's running out of thread stack space. -heap-arrays has no effect on this. You're asking OpenMP to make threadprivate copies of a huge array, and that requires stack. If you don't have enough stack (even an "unlimited" stack isn't unlimited, and environment variable OMP_STACKSIZE controls per-thread stack), you'll get a segfault.

Jim's workaround is really the best approach if you want tmp to be private within the region.

0 Kudos
may_ka
Beginner
580 Views

Thanks Steve and Lionel. That works.

However, I have another issue wanting to add elements to a shared matrix in parallel:

Module ModOne
  Type :: ClassOne
    Integer :: id
  contains
    Procedure, Pass :: One => SubOne
  End type ClassOne
  Private :: SubOne
contains
  Subroutine SubOne(this,tmp)
    Implicit None
    Class(ClassOne), Intent(InOut) :: this
    Real, Intent(InOut) :: tmp(:,:)
    tmp=tmp+this%id
  end Subroutine SubOne
end Module ModOne
Program Test
  use ModOne
  Implicit None
  Type(ClassOne), Allocatable :: a(:)
  Integer :: i=50,j=20000,k
  Real, Allocatable :: tmp(:,:)
  Allocate(a(2))
  a(1)%id=1
  a(2)%id=2
  Allocate(tmp(j,i));tmp=0
  !$OMP PARALLEL SHARED(tmp) num_threads(2)
  !$OMP DO
  Do k=1,2
   call a(k)%One(tmp)
 End Do
 !$OMP END DO
 !$OMP END PARALLEL
 write(50,*) tmp
End Program Test

While the program compiles runs and gives correct results, I am more less sure that it contains DATA RACE conditions. To avoid that I thought about a reduction clause (REDUCTION(+:tmp)) but than the program crashes (probably because of running out of stack). While searching for REDUCTION  on arrays via google yields results, is not defined for arrays in any openmp manual I could get hold on. Is there any openmp way to achieve that?? I could make local arrays bound to each object and later sum them by a single core, but that seemed to be slower.

Any ideas??

Many Thanks

Karl

0 Kudos
jimdempseyatthecove
Honored Contributor III
580 Views

From the above code, it appears that you want to take multiple ClassOne objects in array a() and accumulate the scalar of each into all elements array tmp. A proper parallelization would be to have each thread handle a different section of array tmp:

Module ModOne
  Type :: ClassOne
    Integer :: id
  contains
    Procedure, Pass :: One => SubOne
  End type ClassOne
  Private :: SubOne
contains
  Subroutine SubOne(this,tmp)
    Implicit None
    Class(ClassOne), Intent(InOut) :: this
    Real, Intent(InOut) :: tmp(:,:)
    tmp=tmp+this%id
  end Subroutine SubOne
end Module ModOne
Program Test
  use ModOne
  Implicit None
  Type(ClassOne), Allocatable :: a(:)
  Integer :: i=50,j=20000,k
  Real, Allocatable :: tmp(:,:)
  Allocate(a(2))
  a(1)%id=1
  a(2)%id=2
  Allocate(tmp(j,i));tmp=0
  !$OMP PARALLEL SHARED(tmp) PRIVATE(k) num_threads(2)
  !$OMP SECTIONS
  Do k=1,2
   call a(k)%One(tmp(:,1:ubound(k,DIM=2)/2)
  End Do
  !$OMP SECTION
  Do k=1,2
   call a(k)%One(tmp(:,ubound(k,DIM=2)/2+1:)
  End Do
  !$OMP END SECTIONS
  !$OMP END PARALLEL
  write(50,*) tmp
End Program Test

Jim Dempsey

0 Kudos
Reply