Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

openmp and type bound procedures

may_ka
Beginner
1,551 Views

Hi there,

the following example compiles, but it crashes if I set the thread number > 1

Module ModClassOne
  Type :: ClassOne
    Real(kind=8), Allocatable, Dimension(:,:) :: tmp
  contains
    Procedure, Pass :: Mult => SubMultiply
  end type ClassOne
  Private :: SubMultiply
contains
  Subroutine SubMultiply(this)
    Implicit None
    Class(ClassOne), Intent(InOut) :: this
    Integer :: i
    Do i=1,20
      this%tmp=matmul(this%tmp,this%tmp)
    End Do
  End Subroutine SubMultiply
end Module ModClassOne
Program Test
  use ModClassOne
  Implicit None
  Type(ClassOne) :: T1, T2
  Integer :: dim=1000
  Allocate(T1%tmp(dim,dim),T2%tmp(dim,dim))
  !$OMP PARALLEL NUM_THREADS(2)
  !$OMP SECTIONS
  !$OMP SECTION
  call T1%Mult()
  !$OMP SECTION
  call T2%Mult()
  !$OMP END SECTIONS
  !$OMP END PARALLEL
End Program Test

My questions is ....................... why???

In the end I am aiming to run all objects of a class on its own single core since all objects have only variables bound to the type, thus, the calling sequence of any routine will not contain more than (this).

Thanks a lot.

Karl

0 Kudos
9 Replies
Lorri_M_Intel
Employee
1,551 Views

We're making stack temps at this call:

      this%tmp=matmul(this%tmp,this%tmp)
 

On Windows, this fails with a stack overflow.  That's likely what is happening on Linux as well (these often show simply as "segfault")

It's not really related to the type-bound call, you can reproduce the fail with a few quick edits, and  calling "submultiply" in line.

 

 

0 Kudos
TimP
Honored Contributor III
1,551 Views

I'm not seeing the failure.  I was about to ask for more specifics about compiler version and flags.

0 Kudos
may_ka
Beginner
1,551 Views

Hi,

thanks for the comments

I have no trouble when using gfortran5.2

ifort --version: ifort (IFORT) 16.0.2 20160204

the flags are

ifort -mkl -warn nounused -warn declarations -static -O3 -qopenmp -parallel

with

MKL= -L$(MKLPATH) -I$(MKLINCLUDE) -lmkl_blas95_lp64 -lmkl_lapack95_lp64 -Wl,--start-group $(MKLPATH)/libmkl_intel_lp64.a $(MKLPATH)/libmkl_intel_thread.a $(MKLPATH)/libmkl_core.a -Wl,--end-group -liomp5 -lpthread

OS is Ubuntu 15.04, kernel 4.2.0-27-generic

Thanks a lot

0 Kudos
Steven_L_Intel1
Employee
1,551 Views

gfortran has a default equivalent to our -heap-arrays.

0 Kudos
may_ka
Beginner
1,551 Views

Moreover, when compiling successfully with gfortran, I get no increase in speed when running on Intel(R) Core(TM) i7-3770, but about 30% when running on i7-2637M.

Thanks

Karl

0 Kudos
may_ka
Beginner
1,551 Views

Thanks Steve, including -heap-arrays helped to avoid the seg-fault.

However, when compiling with ifort runtime was 1 minute, with gfortran 30 seconds, both on i7-3770 with loop counter in the routine increased to 40.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,551 Views

I've run into a similar situation where my matrices were fixed size at (6,6). The culprit was if the output array was one of the input arrays, that performance went bonkers. Try this:

Module ModClassOne
  Type :: ClassOne
    Real(kind=8), Allocatable, Dimension(:,:) :: tmp
  contains
    Procedure, Pass :: Mult => SubMultiply
  end type ClassOne
  Private :: SubMultiply
contains
  Subroutine SubMultiply(this)
    Implicit None
    Class(ClassOne), Intent(InOut) :: this
    Class(ClassOne) :: localTemp
    Integer :: i
    ! halve the loop
    ! perform same number of matmult's
    ! avoiding output == one of input
    Do i=1,10
      localTemp%tmp=matmul(this%tmp,this%tmp)
      this%tmp=matmul(localTemp%tmp,localTemp%tmp)
    End Do
  End Subroutine SubMultiply
end Module ModClassOne
Program Test
  use ModClassOne
  Implicit None
  Type(ClassOne) :: T1, T2
  Integer :: dim=1000
  Allocate(T1%tmp(dim,dim),T2%tmp(dim,dim))
  !$OMP PARALLEL NUM_THREADS(2)
  !$OMP SECTIONS
  !$OMP SECTION
  call T1%Mult()
  !$OMP SECTION
  call T2%Mult()
  !$OMP END SECTIONS
  !$OMP END PARALLEL
End Program Test

Jim Dempsey

0 Kudos
may_ka
Beginner
1,551 Views

Thanks Jim, but it didn't help.

However, it might be that the compiler or the linux kernel is broken. We have a commercial ifort 15.0.0 running under linux kernel 3.10. When compiling the code in that environment I see what a expected. Setting the thread number to 2 almost half the runtime. Since there is a confounding of compiler versions and kernels I am not sure where the problem is.

0 Kudos
may_ka
Beginner
1,551 Views

Hi,

I compiled with

 ifort -O3 -static -heap-arrays -qopenmp -c Test.f90
 ifort -O3 -static -qopenmp -o Test Test.o

under linux kernel 4.25 and shipped the exec into the 3.10 kernel environment. Running it there yield the expected times (as above). So the compiler is working, the kernel isn't.

Thanks for the participation.

Cheers

0 Kudos
Reply