ifort crash segmentation fault using DGETRF Lapack

Benjamin_S_2 · ‎11-14-2013

Hello,

I am currently developping a code that requires the inversion of a matrix. I use the Lapack function DGETRF to compute LU decomposition of the matrix.

I am experiencing a segmentation fault as the execution reach this function:

forrtl: severe(174): SIGSEGV, segmentation fault occured

and the terminal crashes (even CTRL+C doesn't work, I am forced to quit the terminal...).

Obviously, this is not due to DGETRF itself. And, I am using it the same way it was done in the code it is based on, so that I don't think it is due to a bad use of the function. Added to that, I asssume that uncorrect input parameters should lead to an error, not a crash. I have tested to reduce the size of the matrix which had no effect. So it is not due to a limit of memory.

I read somewhere that it could be due to a "corrupted stack", or something that I don't really understand. What I understood is that it could be due to a bad construction of my matrix. In the previous code, it was built using one general loop. In my version, I build it using several steps, by initializing it to 0 and filling in it with my values. The matrix is allocatable, and defined in a global way.

Have you any idea of what happen ? What could I do to find the issue ?

Please let me know if you need more information.

Thanks for helping me.

Benjamin

PhD Student in Biomechanics

mecej4 · ‎11-14-2013

Added to that, I asssume that uncorrect input parameters should lead to an error, not a crash.

Frequently, it is not possible to check that every library subroutine has been passed correct arguments. Secondly, even when it is possible, doing extensive checking impairs efficiency.

Please show the subroutine call and the declarations of the arguments, at the least, and, preferably, a small but complete test code that displays the segmentation fault.

TimP · ‎11-15-2013

Ron Green's article on segfaults is still available:

http://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors

You may need to pay attention to managing thread (OMP_STACKSIZE) and shell (ulimit -s) stack size, or using /heap-arrays if your allocation is not in parallel regions.

Benjamin_S_2 · ‎11-18-2013

Thanks for the answers.

mecej4: I do not see a way of giving you a simple code showing the fault, since it consists in solving physics equations using a mesh. However, I can show you the size of my matrix, and the way I call the subroutine.

dimG=881

matrix size: 881x881

integer, allocatable, save, dimension(:) :: iPiv

subroutine call:

integer :: info

allocate( iPiv(dimG) )

       call DGETRF(dimG, dimG, matrix, dimG, iPiv, info)
       if (info .ne. 0) then
         write(*,*) 'Problem happens during computing Gww=PLU'
         STOP
       end if

TimP: Actually, I have already read this article, and I assumed I was in the 3rd case, because my code is based on an existing code that works fine. The matrix I use is smaller than in the working code. I concluded that it was not a memory issue. Added to that, I am not sure of the effects of setting ulimit to unlimited, and I don't understand what heap-array does. However, I tried the option of setting the stack size to unlimited. It does not change the issue. And if I set the stack size to unlimited in a terminal, the stack size is blocked at 8192 in other terminals... ?? Is 8192 the max ? How the max of stack size is determined ? Does it depend on my computer/OS ?

I also tried -heap-arrays, does not solve the issue.

I have not been able to run idb, and gdb does not give me something useful. I have been advised to try valgrind --leak-check=yes, so I am currently looking at the error report, which seems to show smthg, but not where I expected...

Lorri_M_Intel · ‎11-18-2013

Can you show the declaration of MATRIX too? thanks-

Benjamin_S_2 · ‎11-18-2013

allocate( matrix(dimG, dimG) )

matrix=0.0d0

The matrix is then built using a loop on its components.

Benjamin_S_2 · ‎11-18-2013

I now have several leads to follow, I will come back to you once I checked all of them.

Thanks again.

Benjamin