- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
the following example compiles, but it crashes if I set the thread number > 1
Module ModClassOne Type :: ClassOne Real(kind=8), Allocatable, Dimension(:,:) :: tmp contains Procedure, Pass :: Mult => SubMultiply end type ClassOne Private :: SubMultiply contains Subroutine SubMultiply(this) Implicit None Class(ClassOne), Intent(InOut) :: this Integer :: i Do i=1,20 this%tmp=matmul(this%tmp,this%tmp) End Do End Subroutine SubMultiply end Module ModClassOne Program Test use ModClassOne Implicit None Type(ClassOne) :: T1, T2 Integer :: dim=1000 Allocate(T1%tmp(dim,dim),T2%tmp(dim,dim)) !$OMP PARALLEL NUM_THREADS(2) !$OMP SECTIONS !$OMP SECTION call T1%Mult() !$OMP SECTION call T2%Mult() !$OMP END SECTIONS !$OMP END PARALLEL End Program Test
My questions is ....................... why???
In the end I am aiming to run all objects of a class on its own single core since all objects have only variables bound to the type, thus, the calling sequence of any routine will not contain more than (this).
Thanks a lot.
Karl
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We're making stack temps at this call:
this%tmp=matmul(this%tmp,this%tmp)
On Windows, this fails with a stack overflow. That's likely what is happening on Linux as well (these often show simply as "segfault")
It's not really related to the type-bound call, you can reproduce the fail with a few quick edits, and calling "submultiply" in line.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not seeing the failure. I was about to ask for more specifics about compiler version and flags.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
thanks for the comments
I have no trouble when using gfortran5.2
ifort --version: ifort (IFORT) 16.0.2 20160204
the flags are
ifort -mkl -warn nounused -warn declarations -static -O3 -qopenmp -parallel
with
MKL= -L$(MKLPATH) -I$(MKLINCLUDE) -lmkl_blas95_lp64 -lmkl_lapack95_lp64 -Wl,--start-group $(MKLPATH)/libmkl_intel_lp64.a $(MKLPATH)/libmkl_intel_thread.a $(MKLPATH)/libmkl_core.a -Wl,--end-group -liomp5 -lpthread
OS is Ubuntu 15.04, kernel 4.2.0-27-generic
Thanks a lot
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
gfortran has a default equivalent to our -heap-arrays.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Moreover, when compiling successfully with gfortran, I get no increase in speed when running on Intel(R) Core(TM) i7-3770, but about 30% when running on i7-2637M.
Thanks
Karl
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Steve, including -heap-arrays helped to avoid the seg-fault.
However, when compiling with ifort runtime was 1 minute, with gfortran 30 seconds, both on i7-3770 with loop counter in the routine increased to 40.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've run into a similar situation where my matrices were fixed size at (6,6). The culprit was if the output array was one of the input arrays, that performance went bonkers. Try this:
Module ModClassOne Type :: ClassOne Real(kind=8), Allocatable, Dimension(:,:) :: tmp contains Procedure, Pass :: Mult => SubMultiply end type ClassOne Private :: SubMultiply contains Subroutine SubMultiply(this) Implicit None Class(ClassOne), Intent(InOut) :: this Class(ClassOne) :: localTemp Integer :: i ! halve the loop ! perform same number of matmult's ! avoiding output == one of input Do i=1,10 localTemp%tmp=matmul(this%tmp,this%tmp) this%tmp=matmul(localTemp%tmp,localTemp%tmp) End Do End Subroutine SubMultiply end Module ModClassOne Program Test use ModClassOne Implicit None Type(ClassOne) :: T1, T2 Integer :: dim=1000 Allocate(T1%tmp(dim,dim),T2%tmp(dim,dim)) !$OMP PARALLEL NUM_THREADS(2) !$OMP SECTIONS !$OMP SECTION call T1%Mult() !$OMP SECTION call T2%Mult() !$OMP END SECTIONS !$OMP END PARALLEL End Program Test
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Jim, but it didn't help.
However, it might be that the compiler or the linux kernel is broken. We have a commercial ifort 15.0.0 running under linux kernel 3.10. When compiling the code in that environment I see what a expected. Setting the thread number to 2 almost half the runtime. Since there is a confounding of compiler versions and kernels I am not sure where the problem is.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I compiled with
ifort -O3 -static -heap-arrays -qopenmp -c Test.f90
ifort -O3 -static -qopenmp -o Test Test.o
under linux kernel 4.25 and shipped the exec into the 3.10 kernel environment. Running it there yield the expected times (as above). So the compiler is working, the kernel isn't.
Thanks for the participation.
Cheers
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page