openmp and type bound procedures

may_ka · ‎02-24-2016

Hi there,

the following example compiles, but it crashes if I set the thread number > 1

Module ModClassOne
  Type :: ClassOne
    Real(kind=8), Allocatable, Dimension(:,:) :: tmp
  contains
    Procedure, Pass :: Mult => SubMultiply
  end type ClassOne
  Private :: SubMultiply
contains
  Subroutine SubMultiply(this)
    Implicit None
    Class(ClassOne), Intent(InOut) :: this
    Integer :: i
    Do i=1,20
      this%tmp=matmul(this%tmp,this%tmp)
    End Do
  End Subroutine SubMultiply
end Module ModClassOne
Program Test
  use ModClassOne
  Implicit None
  Type(ClassOne) :: T1, T2
  Integer :: dim=1000
  Allocate(T1%tmp(dim,dim),T2%tmp(dim,dim))
  !$OMP PARALLEL NUM_THREADS(2)
  !$OMP SECTIONS
  !$OMP SECTION
  call T1%Mult()
  !$OMP SECTION
  call T2%Mult()
  !$OMP END SECTIONS
  !$OMP END PARALLEL
End Program Test

My questions is ....................... why???

In the end I am aiming to run all objects of a class on its own single core since all objects have only variables bound to the type, thus, the calling sequence of any routine will not contain more than (this).

Thanks a lot.

Karl

Lorri_M_Intel · ‎02-24-2016

We're making stack temps at this call:

this%tmp=matmul(this%tmp,this%tmp)

On Windows, this fails with a stack overflow. That's likely what is happening on Linux as well (these often show simply as "segfault")

It's not really related to the type-bound call, you can reproduce the fail with a few quick edits, and calling "submultiply" in line.

TimP · ‎02-24-2016

I'm not seeing the failure. I was about to ask for more specifics about compiler version and flags.

may_ka · ‎02-24-2016

Hi,

thanks for the comments

I have no trouble when using gfortran5.2

ifort --version: ifort (IFORT) 16.0.2 20160204

the flags are

ifort -mkl -warn nounused -warn declarations -static -O3 -qopenmp -parallel

with

MKL= -L$(MKLPATH) -I$(MKLINCLUDE) -lmkl_blas95_lp64 -lmkl_lapack95_lp64 -Wl,--start-group $(MKLPATH)/libmkl_intel_lp64.a $(MKLPATH)/libmkl_intel_thread.a $(MKLPATH)/libmkl_core.a -Wl,--end-group -liomp5 -lpthread

OS is Ubuntu 15.04, kernel 4.2.0-27-generic

Thanks a lot

Steven_L_Intel1 · ‎02-24-2016

gfortran has a default equivalent to our -heap-arrays.

may_ka · ‎02-24-2016

Moreover, when compiling successfully with gfortran, I get no increase in speed when running on Intel(R) Core(TM) i7-3770, but about 30% when running on i7-2637M.

Thanks

Karl

may_ka · ‎02-24-2016

Thanks Steve, including -heap-arrays helped to avoid the seg-fault.

However, when compiling with ifort runtime was 1 minute, with gfortran 30 seconds, both on i7-3770 with loop counter in the routine increased to 40.

jimdempseyatthecove · ‎02-24-2016

I've run into a similar situation where my matrices were fixed size at (6,6). The culprit was if the output array was one of the input arrays, that performance went bonkers. Try this:

Module ModClassOne
  Type :: ClassOne
    Real(kind=8), Allocatable, Dimension(:,:) :: tmp
  contains
    Procedure, Pass :: Mult => SubMultiply
  end type ClassOne
  Private :: SubMultiply
contains
  Subroutine SubMultiply(this)
    Implicit None
    Class(ClassOne), Intent(InOut) :: this
    Class(ClassOne) :: localTemp
    Integer :: i
    ! halve the loop
    ! perform same number of matmult's
    ! avoiding output == one of input
    Do i=1,10
      localTemp%tmp=matmul(this%tmp,this%tmp)
      this%tmp=matmul(localTemp%tmp,localTemp%tmp)
    End Do
  End Subroutine SubMultiply
end Module ModClassOne
Program Test
  use ModClassOne
  Implicit None
  Type(ClassOne) :: T1, T2
  Integer :: dim=1000
  Allocate(T1%tmp(dim,dim),T2%tmp(dim,dim))
  !$OMP PARALLEL NUM_THREADS(2)
  !$OMP SECTIONS
  !$OMP SECTION
  call T1%Mult()
  !$OMP SECTION
  call T2%Mult()
  !$OMP END SECTIONS
  !$OMP END PARALLEL
End Program Test

Jim Dempsey

may_ka · ‎02-24-2016

Thanks Jim, but it didn't help.

However, it might be that the compiler or the linux kernel is broken. We have a commercial ifort 15.0.0 running under linux kernel 3.10. When compiling the code in that environment I see what a expected. Setting the thread number to 2 almost half the runtime. Since there is a confounding of compiler versions and kernels I am not sure where the problem is.

may_ka · ‎02-24-2016

Hi,

I compiled with

ifort -O3 -static -heap-arrays -qopenmp -c Test.f90
ifort -O3 -static -qopenmp -o Test Test.o

under linux kernel 4.25 and shipped the exec into the 3.10 kernel environment. Running it there yield the expected times (as above). So the compiler is working, the kernel isn't.

Thanks for the participation.

Cheers