Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Optimal Matrix Operations

mohanmuthu
New Contributor I
624 Views
I work with huge data, where matrix multiplication too involved. Input matrices are sized5 million x 10 & 10 x 100. I tried to do the MATMUL operation as a whole, but faced the stack size error.So, nowperforming insteps of 20,000. Can any one suggest the faster way of doing it?
0 Kudos
2 Replies
TimP
Honored Contributor III
624 Views
If you tried the opt-matmul implicit invocation of MKL with threaded parallelism, you might need to adjust global stack (/link /stack) and thread stack (KMP_STACKSIZE). ifort MATMUL usually allocates another matrix for the result, even in cases where that might be avoided by calling ?gemm directly.
If you don't want to use 64-bit (Intel64) mode, splitting the problem may be required.
0 Kudos
John_Campbell
New Contributor II
624 Views
I wrote a sample program to test a number of alternative approaches to MATMUL. They should produce different run time results if you have limited physical memory. On my PC they all worked and showed no problems, including the use of MATMUL.
If memory is limited, check the size of your paging file.

[bash]! program to test problem, in smaller memory ! real*8, allocatable, dimension(:,:) :: A, B, C integer*4 l,m,n, stat, i,j, to_mb real*8 e, dt external dt ! write (*,99) 'Start test', dt() l = 5000000 m = 10 n = 100 ! to_mb = 1024*1024/kind(C) allocate ( C(l,n), stat=stat ) ; write (*,98) 'C allocated',stat, dt(), ' size C', size(C) / to_mb,' mb' allocate ( A(l,m), stat=stat ) ; write (*,98) 'A allocated',stat, dt(), ' size A', size(A) / to_mb,' mb' allocate ( B(m,n), stat=stat ) ; write (*,98) 'B allocated',stat, dt(), ' size B', size(B) / to_mb,' mb' 98 format (a,i5,f10.3,a,i6,a) ! A = 1 B = 2 ! call matrix_multiply (A, B, C, l,m,n) ! C = MatMul (A, B) write (*,99) 'Use intrinsic MATMUL function ', dt() 99 format (a,f10.3) ! e = 0 do j = 1,n do i = 1,l e = e + abs(c(i,j)-20.0d0) end do end do write (*,97) 'error = ',e, dt() 97 format (13x,a,es10.3,f10.3) end subroutine matrix_multiply (A, B, C, l,m,n) ! ! subroutine to multiply = x ! ! uses different loop order to control memory access ! integer*4 l, m, n real*8 A(l,m), B(m,n), C(l,n) real*8, allocatable, dimension(:,:) :: at real*8, allocatable, dimension(:) :: row ! integer*4 i,j,k, stat, to_mb real*8 s, dt, e external dt ! ! initialise C to zero so order of do... ) e = 0 do j = 1,n do i = 1,l s = 0 do k = 1,m s = s + A(i,k) * B(k,j) end do C(i,j) = s e = e + abs(c(i,j)-20.0d0) end do end do write (*,99) 'modified order to sequentially cycle through ', dt() write (*,97) 'error = ',e ! ! Use transpose to sequentially cycle through ' and ! modified order to sequentially cycle through and ' ! allocate (AT(m,l), stat=stat) do k = 1,m do i = 1,l AT(k,i) = A(i,k) end do end do to_mb = 1024*1024/kind(C) write (*,98) 'AT allocated',stat, dt(), ' size AT', size(AT) / to_mb,' mb' 98 format (a,i5,f10.3,a,i6,a) ! e = 0 do j = 1,n do i = 1,l C(i,j) = dot_pr...'", dt() write (*,97) 'error = ',e deallocate (AT) ! ! Conventional order, but use only row of A for sequential access of ' ! has problem that it does not sequentially access allocate (row(m)) ! row(10) e = 0 do i = 1,l row = A(i,:) do j = 1,n C(i,j) = dot_product ( row, B(:,j)) e = e + abs(c(i,j)-20.0d0) end do end do write (*,99) 'Conventional order, but use only row of A '...
0 Kudos
Reply