Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
27323 Discussions

matmul segmentation fault for huge matrices

hgeorge
Beginner
474 Views
Hi all!
I would be glad if someone could tell me why the following matmul segfaults for a square matrix with dimension s > 1023:

! Reproduce segmentation fault when multiplying big matrices
program test_segfault
implicit none
integer :: s ! dimension of matrix
real(8), allocatable :: W(:,:)

! This segfaults for s > 1023. Why?!
s=1024
allocate(W(s,s))
W(:,:) = 1.1D0
write(*,*) 'Matrix initialized'
W = matmul(W,W)
write(*,*) 'Matrix multiplied'

end program test_segfault

I'm compiling this with ifort 11 on a 64bit Xeon system without any problems. However, when executing the matmul there is a segmentation fault. Can anyone reproduce this? Any ideas why this happens? Is this a bug or a feature? Similar segmentation faults also appear when using the gemm routines from the mkl. How is one supposed to multiply huge matrices?

Thanks,
hgeorge
0 Kudos
4 Replies
Kevin_D_Intel
Employee
474 Views

Looks to be an array temporary usage induced segmentation fault with an insufficient shell stack limit. It appears the array temps are created inside matmul itself and not related to the arguments passed.

Check your ulimit -s setting. Anything setting less than about 8192 triggers a segmentation fault when s=1024. If your stack limit is less than this, there is one of two things you can do to avoid the issue:

1. Compile with -heap-arrays

-OR-

2. Increase the shell stack limit. (For bash/ksh use: ulimit -s unlimited or For csh/tcsh use: set limit unlimited)

Unclear if this represents a compiler bug at this time.
hgeorge
Beginner
474 Views
Thank you very much for this rapid answer. Both solutions work for matmul.

When using gemm from mkl however only ulimit works for me. Compiling with -heap-arrays doesn't seem to help. Is there something specific I should set when compiling or linking against mkl?

Thanks again,
hgeorge
hgeorge
Beginner
474 Views
Quoting - hgeorge
When using gemm from mkl however only ulimit works for me. Compiling with -heap-arrays doesn't seem to help. Is there something specific I should set when compiling or linking against mkl?

Sorry for updating my own post. But perhaps others have similar problems, too.
I now do have a minimal example working using gemm from mkl and -heap-arrays compile option. I am confused, however, because I used the exact same compile options from my 'production' program, where -heap-arrays seems to have no effect on gemm but only on matmul. This is in the same subroutine, like, e. g.:

subroutine foo
...
W = matmul(W,W)
call gemm(W,W,W,'N','N','1.0D0','0.0D0')
end subroutine foo

Using the above in my real program where I link in some other object files and the subroutine is itself in another object file (compiled with -heap-arrays) yields to a segfault on call gemm, but not on matmul. -heap-arrays did help, though, because otherwise the segfault occured already at matmul
Using the above compiled with the exact same options but as the main program works and doesn't segfault.

Do I have to compile each and every object-file of my final program with -heap-arrays ? Is there specific documentation on using mkl (or blas especially) with huge matrices which need -heap-arrays? I searched the mkl userguide and manual to no avail... I hope I don't sound too confusing...

Greetings,
hgeorge

Kevin_D_Intel
Employee
474 Views

I missed this earlier. In your original example, the array temp stack allocation occurs when assigning the matmul result to the same array W that you also passed into matmul.

If you allocate another array, Y, with the same dimensions as W and assign matmul results to Y then there is no array temp created, and the original example runs without the aid of either earlier mentioned methods, -heap-arrays and with as little as a 1024 kbytes shell stack limit.

-heap-arrays only affects source that ifort actually compiles; not pre-compiled libraries like MKL.

I do not know what happens inside the MKL gemm routine. It is possible something similar with an array temp might be occurring since you are storing the results into the same source array, W. For your new test case above, try creating and using an array Y for the result array with gemm to see if that avoids the need to up the shell stack limit when calling it. Specific MKL question are better posted in the MKL specific forum if you have others.
Reply