>>>Is the stack size to be

Gib_B_ · ‎08-17-2014

Hi,

I'm using MKL pardiso on a 64-bit Windows 7 machine with 32 GB of RAM. I am running a solver using backward Euler, with a steadily increasing number of variables. When the sparse matrix that I'm inverting gets to about 600x600 the program crashes with the "Insufficient virtual memory" message. This is not a problem with any of the memory that I control directly - I am careful about deallocating after allocating. This is recent a MKL - with IVF Composer XE 2013.

I am using OpenMP, and the crash occurs with ncpu = 8.

I don't understand why the program needs virtual memory anyway - at the peak the total memory usage reported by Task Manager is less than 7 GB.

Since I have just started using MKL and pardiso I will not be surprised to learn that I am missing something simple.

Thanks

Gib

Bernard · ‎08-17-2014

If you have statically allocated array(s) try to increase thread's stack size maybe up to 1GB. I suppose that your thread is using default stack size 1MB. For more in depth troubleshooting please use VMmap and RAMmap tools.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686774(v=vs.85).aspx

Gib_B_ · ‎08-17-2014

How does one set thread stack size in Fortran?

Gib_B_ · ‎08-17-2014

I found the /F command-line option for setting stack size, and used /F0x100000000. The program crashes at the same points. I should mention that the Fortran code is built as a DLL. Is the stack size to be set in the DLL build, or in the calling program build?

I have now seen that the stack size can be set in the IDE via Properties > Linker > System > Stack Reserve Size and > Stack Commit Size. It is not clear which should be set, so I have set both to 0x10000000, which seems to be the maximum permitted for 32-bit operation. This I did in both the DLL build and the main program build. In any case, it made no difference - the program still crashes in the same way.

I'm starting to get the impression that the documentation for MKL, or for pardiso anyway, leaves something to be desired. Surely it should not be such trouble to find out how to execute what is a rather small problem case.

mecej4 · ‎08-17-2014

I suggest that the Fortran runtime error messages be taken with a bit of skepticism. Even with 32-bit code on a 32-bit OS, a dense 600 X 600 matrix would occupy 2.88 megabytes, so there should be no memory problems on your system. I have run Pardiso-32-bit on matrices that were much larger.

If there is a mismatch between 4-byte integers and 8-byte integers exchanged between your program and MKL, strange error messages are to be expected.

Is is feasible for you post a reproducer with a statement of the compiler options used, or at least a description of the arguments passed to Pardiso?

Gib_B_ · ‎08-17-2014

I agree, it makes no sense. I'll try to set up a simple example.

Gib_B_ · ‎08-17-2014

I first played with a simple program to test my matrix inversion subroutine, which uses pardiso. I found that with /Qmkl:parallel I can invert a 4000x4000 sparse matrix. With Qmkl:sequential I can go to 5000x5000. So there seems to be no problem with my pardiso code. As far as I can see the compiler settings are the same as in my real program. The only obvious difference is that I am invoking the inversion code repeatedly in the real program, which suggests that some memory is not getting freed. I can't see that anywhere in my code, but I will search more closely. Here is my subroutine (REAL_KIND is double precision):

module pardiso_mod
use csr_mod
use MKL_PARDISO
implicit none

contains

subroutine invert(A,N,Ainv,res)
real(REAL_KIND) :: A(N,N), Ainv(N,N)
integer :: N, res
TYPE(MKL_PARDISO_HANDLE), allocatable :: PT(:)
integer, allocatable :: perm(:), ivect(:), jvect(:)
real(REAL_KIND), allocatable :: b(:), x(:), AS(:)
INTEGER :: maxfct, mnum, mtype, phase, nrhs, msglvl, i, j, k
INTEGER :: iparm(64)
integer :: NN, NE, err

maxfct = 1
mnum = 1
nrhs = N ! matrix inversion
NN = N*N

allocate( PT(64), perm(N), b(nrhs*N), x(nrhs*N), AS(NN), ivect(NN), jvect(NN) )
! ivect(:) is row_index(:)
! jvect(:) is col_index(:)

do i = 1, 64
iparm(i) = 0
PT(i)%DUMMY = 0
end do
iparm(1) = 0 ! solver default

err = 0 ! initialize error flag
msglvl = 0 ! 1 = print statistical information
mtype = 11 ! real unsymmetric
phase = 13 ! solve

call to_csr3_1(A,N,NN,AS,ivect,jvect,NE,err) ! 1-based
if (err /= 0) then
    deallocate(PT, perm, b, x, AS, ivect, jvect)
    res = 1
    return
endif

b = 0
do i = 1,N
    b(i+(i-1)*N) = 1
enddo
err = 0
CALL pardiso (PT, maxfct, mnum, mtype, phase, N, AS, ivect, jvect, perm, nrhs, iparm, msglvl, b, x, err)
if (err /= 0) then
    deallocate(PT, perm, b, x, AS, ivect, jvect)
    res = 2
    return
endif

k = 0
do j = 1,N
    do i = 1,N
        k = k+1
        Ainv(i,j) = x(k)
    enddo
enddo
deallocate(PT, perm, b, x, AS, ivect, jvect)
res = 0
end subroutine

end module

......

<ahem> You will have noticed the lack of a call to pardiso with phase = -1 at the conclusion. This I was unaware of, and now that I have added it I find that my program is running past the point where it was crashing. In other words, it seems to be a case of RTFM :(

Thanks for your help :)

mecej4 · ‎08-17-2014

Your analysis is plausible. Each call of your subroutine caused Pardiso to leak a little molehill of memory, but that would go unnoticed before you had made hundreds or thousands of calls and the accumulated leaked memory then amounted to a mountain, and the OS finally took notice.

Bernard · ‎08-17-2014

>>>Is the stack size to be set in the DLL build, or in the calling program build?>>>

Probably program build.