Crash when using mkl_dcsrgemv() and large arrays

adcahi01 · ‎01-14-2011

I'm trying to use mkl_dcsrgemv() with large sparse arrays in an iterative solver subroutine. My subroutine is below.

[fortran]	subroutine CGLS(nrows, va, ia, ja, vat, iat, jat, mguess, dat, iterations, sol)
	
	! Define input/output variables.	
	integer, intent(in)  :: nrows
	integer, intent(in)  :: iterations
	
	integer, dimension(:), intent(in) :: ia
	integer, dimension(:), intent(in) :: iat
	integer, dimension(:), intent(in) :: ja
	integer, dimension(:), intent(in) :: jat
	
	real(8), dimension(:), intent(in) :: va
	real(8), dimension(:), intent(in) :: vat
	real(8), dimension(:), intent(in) :: mguess
	real(8), dimension(:), intent(in) :: dat
	
	real(8), dimension(:), intent(out) :: sol
	
	! Define temporary local variables.	
	real(8) :: alpha, beta, num, den
	
	real(8) :: s(nrows)
	real(8) :: q(nrows)
	real(8) :: resd(nrows)
	real(8) :: r(maxval(ja))
	real(8) :: rold(maxval(ja))
	real(8) :: p(maxval(ja))
	
	integer :: i
		
	! Initialize step zero values.
	call mkl_dcsrgemv('N', nrows, va, ia, ja, mguess, resd)
	s = dat-resd
	call mkl_dcsrgemv('T', nrows, va, ia, ja, s, r)
	p = r
	call mkl_dcsrgemv('N', nrows, va, ia, ja, r, q)
	sol = mguess
		
	! Run the Conjugate Gradient Least Squares (CGLS) algorithm.
	do i = 1,iterations

		alpha = ddot(size, r, 1, r, 1)/ddot(size(q), q, 1, q, 1)
		call daxpy(size(p), alpha, p, 1, sol, 1)
		call daxpy(size(q), -alpha, q, 1, s, 1)
		rold  = r
		den = ddot(size(rold), rold, 1, rold, 1)
		call mkl_dcsrgemv('T', nrows, va, ia, ja, s, r)
		num = ddot(size, r, 1, r, 1)
		beta = num/den
		p     = r+beta*p
		call mkl_dcsrgemv('N', nrows, va, ia, ja, p, q)
		
	end do
	
	end subroutine CGLS

	end program main[/fortran]

There are two lines where I need to compute the transpose of a matrix times a vector. These lines work fine if the matrix is small, but when I run the program with a large matrix, the program crashes.

[bash]*** glibc detected *** ./STREAKVID: free(): invalid pointer: 0x424d4020 ***
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6(+0x6b591)[0x40359591]
/lib/tls/i686/cmov/libc.so.6(+0x6cde8)[0x4035ade8]
/lib/tls/i686/cmov/libc.so.6(cfree+0x6d)[0x4035decd]
./STREAKVID[0x805dfe6]
./STREAKVID[0x804d479]
./STREAKVID[0x804b152]
./STREAKVID[0x804a404]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0x40304bd6]
./STREAKVID[0x804a311]
======= Memory map: ========
08048000-080b7000 r-xp 00000000 08:01 655361     /home/adam/LPS/Streak Video/STREAKVID
080b7000-0833c000 rw-p 0006f000 08:01 655361     /home/adam/LPS/Streak Video/STREAKVID
0833c000-0870b000 rw-p 00000000 00:00 0 
0966b000-0968c000 rw-p 00000000 00:00 0          [heap]
40000000-4001b000 r-xp 00000000 08:01 1836306    /lib/ld-2.11.1.so
4001b000-4001c000 r--p 0001a000 08:01 1836306    /lib/ld-2.11.1.so
4001c000-4001d000 rw-p 0001b000 08:01 1836306    /lib/ld-2.11.1.so
4001d000-4001e000 r-xp 00000000 00:00 0          [vdso]
4001e000-40020000 rw-p 00000000 00:00 0 
40020000-401f4000 r-xp 00000000 08:01 791620     /opt/intel/composerxe-2011.0.084/mkl/lib/ia32/libmkl_rt.so
401f4000-401f8000 rw-p 001d3000 08:01 791620     /opt/intel/composerxe-2011.0.084/mkl/lib/ia32/libmkl_rt.so
401f8000-401ff000 rw-p 00000000 00:00 0 
4020f000-40224000 r-xp 00000000 08:01 1836289    /lib/tls/i686/cmov/libpthread-2.11.1.so
40224000-40225000 r--p 00014000 08:01 1836289    /lib/tls/i686/cmov/libpthread-2.11.1.so
40225000-40226000 rw-p 00015000 08:01 1836289    /lib/tls/i686/cmov/libpthread-2.11.1.so
40226000-40229000 rw-p 00000000 00:00 0 
40229000-4024d000 r-xp 00000000 08:01 1836277    /lib/tls/i686/cmov/libm-2.11.1.so
4024d000-4024e000 r--p 00023000 08:01 1836277    /lib/tls/i686/cmov/libm-2.11.1.so
4024e000-4024f000 rw-p 00024000 08:01 1836277    /lib/tls/i686/cmov/libm-2.11.1.so
4024f000-402dd000 r-xp 00000000 08:01 788108     /opt/intel/composerxe-2011.0.084/compiler/lib/ia32/libiomp5.so
402dd000-402e3000 rw-p 0008e000 08:01 788108     /opt/intel/composerxe-2011.0.084/compiler/lib/ia32/libiomp5.so
402e3000-402ea000 rw-p 00000000 00:00 0 
402ea000-402ec000 r-xp 00000000 08:01 1836276    /lib/tls/i686/cmov/libdl-2.11.1.so
402ec000-402ed000 r--p 00001000 08:01 1836276    /lib/tls/i686/cmov/libdl-2.11.1.so
402ed000-402ee000 rw-p 00002000 08:01 1836276    /lib/tls/i686/cmov/libdl-2.11.1.so
402ee000-40441000 r-xp 00000000 08:01 1836271    /lib/tls/i686/cmov/libc-2.11.1.so
40441000-40442000 ---p 00153000 08:01 1836271    /lib/tls/i686/cmov/libc-2.11.1.so
40442000-40444000 r--p 00153000 08:01 1836271    /lib/tls/i686/cmov/libc-2.11.1.so
40444000-40445000 rw-p 00155000 08:01 1836271    /lib/tls/i686/cmov/libc-2.11.1.so
40445000-40448000 rw-p 00000000 00:00 0 
40448000-40465000 r-xp 00000000 08:01 1835136    /lib/libgcc_s.so.1
40465000-40466000 r--p 0001c000 08:01 1835136    /lib/libgcc_s.so.1
40466000-40467000 rw-p 0001d000 08:01 1835136    /lib/libgcc_s.so.1
40467000-40469000 rw-p 00000000 00:00 0 
40469000-40ba5000 r-xp 00000000 08:01 791337     /opt/intel/composerxe-2011.0.084/mkl/lib/ia32/libmkl_core.so
40ba5000-40bae000 rw-p 0073b000 08:01 791337     /opt/intel/composerxe-2011.0.084/mkl/lib/ia32/libmkl_core.so
40bae000-40bb5000 rw-p 00000000 00:00 0 
40bb5000-4105a000 r-xp 00000000 08:01 791493     /opt/intel/composerxe-2011.0.084/mkl/lib/ia32/libmkl_intel_thread.so
4105a000-41213000 rw-p 004a4000 08:01 791493     /opt/intel/composerxe-2011.0.084/mkl/lib/ia32/libmkl_intel_thread.so
41213000-41215000 rw-p 00000000 00:00 0 
41215000-4151b000 r-xp 00000000 08:01 791492     /opt/intel/composerxe-2011.0.084/mkl/lib/ia32/libmkl_intel.so
4151b000-41521000 rw-p 00305000 08:01 791492     /opt/intel/composerxe-2011.0.084/mkl/lib/ia32/libmkl_intel.so
41521000-41523000 rw-p 00000000 00:00 0 
41523000-42403000 r-xp 00000000 08:01 791531     /opt/intel/composerxe-2011.0.084/mkl/lib/ia32/libmkl_p4.so
42403000-42450000 rw-p 00edf000 08:01 791531     /opt/intel/composerxe-2011.0.084/mkl/lib/ia32/libmkl_p4.so
42450000-42575000 rw-p 00000000 00:00 0 
42600000-42621000 rw-p 00000000 00:00 0 
42621000-42700000 ---p 00000000 00:00 0 
bf17b000-bfaa8000 rwxp 00000000 00:00 0          [stack]
bfaa8000-bfaaa000 rw-p 00000000 00:00 0 
Aborted
[/bash]

My solution at this point has been to pass both the matrix (va, ja, and ia) and it's transpose (vat, jat, and iat) to the subroutine. In this case, the two lines that call for the transpose are replaced with

[fortran]call mkl_dcsrgemv('N', maxval(ja), vat, iat, jat, s, r)[/fortran]

This solution obtains the correct result, but I'd like to avoid passing so many variables into the subroutine. I don't understand the error messages so I don't know how to begin tracking down the problem. Any help would be greatly appreciated.

I should note that I am running this in a 32-bit linux environment with MKL 10.3. The command I'm using to compile is

[bash]ifort StreakVid.f90 -check all -fpe0 -traceback -L$MKLROOT -lmkl_rt -openmp -lpthread -heap-arrays -o STREAKVID[/bash]

This is ifort version 12.0.0.

Sergey_P_Intel2 · ‎01-24-2011

Hi,

It looks like the issue could originate from specific software / hardware limitations. Could you provide us with more details about it: namely with the size of the matrix, number of threads, version of MKL (Beta, Gold or Update 1?) and your hardware configuration: processor and memory info, virtual memory settings.

Regards,

Sergey

Gennady_F_Intel · ‎02-06-2011

hiadcahi01, any update regarding the questions Sergey asked you?