- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I work with huge data, where matrix multiplication too involved. Input matrices are sized5 million x 10 & 10 x 100. I tried to do the MATMUL operation as a whole, but faced the stack size error.So, nowperforming insteps of 20,000. Can any one suggest the faster way of doing it?
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you tried the opt-matmul implicit invocation of MKL with threaded parallelism, you might need to adjust global stack (/link /stack) and thread stack (KMP_STACKSIZE). ifort MATMUL usually allocates another matrix for the result, even in cases where that might be avoided by calling ?gemm directly.
If you don't want to use 64-bit (Intel64) mode, splitting the problem may be required.
If you don't want to use 64-bit (Intel64) mode, splitting the problem may be required.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wrote a sample program to test a number of alternative approaches to MATMUL. They should produce different run time results if you have limited physical memory. On my PC they all worked and showed no problems, including the use of MATMUL.
If memory is limited, check the size of your paging file.[bash]! program to test problem, in smaller memory ! real*8, allocatable, dimension(:,:) :: A, B, C integer*4 l,m,n, stat, i,j, to_mb real*8 e, dt external dt ! write (*,99) 'Start test', dt() l = 5000000 m = 10 n = 100 ! to_mb = 1024*1024/kind(C) allocate ( C(l,n), stat=stat ) ; write (*,98) 'C allocated',stat, dt(), ' size C', size(C) / to_mb,' mb' allocate ( A(l,m), stat=stat ) ; write (*,98) 'A allocated',stat, dt(), ' size A', size(A) / to_mb,' mb' allocate ( B(m,n), stat=stat ) ; write (*,98) 'B allocated',stat, dt(), ' size B', size(B) / to_mb,' mb' 98 format (a,i5,f10.3,a,i6,a) ! A = 1 B = 2 ! call matrix_multiply (A, B, C, l,m,n) ! C = MatMul (A, B) write (*,99) 'Use intrinsic MATMUL function ', dt() 99 format (a,f10.3) ! e = 0 do j = 1,n do i = 1,l e = e + abs(c(i,j)-20.0d0) end do end do write (*,97) 'error = ',e, dt() 97 format (13x,a,es10.3,f10.3) end subroutine matrix_multiply (A, B, C, l,m,n) ! ! subroutine to multiply = x
!
! uses different loop order to control memory access
!
integer*4 l, m, n
real*8 A(l,m), B(m,n), C(l,n)
real*8, allocatable, dimension(:,:) :: at
real*8, allocatable, dimension(:) :: row
!
integer*4 i,j,k, stat, to_mb
real*8 s, dt, e
external dt
!
! initialise C to zero so order of do... )
e = 0
do j = 1,n
do i = 1,l
s = 0
do k = 1,m
s = s + A(i,k) * B(k,j)
end do
C(i,j) = s
e = e + abs(c(i,j)-20.0d0)
end do
end do
write (*,99) 'modified order to sequentially cycle through ', dt()
write (*,97) 'error = ',e
!
! Use transpose to sequentially cycle through ' and
! modified order to sequentially cycle through and '
!
allocate (AT(m,l), stat=stat)
do k = 1,m
do i = 1,l
AT(k,i) = A(i,k)
end do
end do
to_mb = 1024*1024/kind(C)
write (*,98) 'AT allocated',stat, dt(), ' size AT', size(AT) / to_mb,' mb'
98 format (a,i5,f10.3,a,i6,a)
!
e = 0
do j = 1,n
do i = 1,l
C(i,j) = dot_pr... '", dt()
write (*,97) 'error = ',e
deallocate (AT)
!
! Conventional order, but use only row of A for sequential access of '
! has problem that it does not sequentially access
allocate (row(m)) ! row(10)
e = 0
do i = 1,l
row = A(i,:)
do j = 1,n
C(i,j) = dot_product ( row, B(:,j))
e = e + abs(c(i,j)-20.0d0)
end do
end do
write (*,99) 'Conventional order, but use only row of A '...
If memory is limited, check the size of your paging file.[bash]! program to test problem, in smaller memory ! real*8, allocatable, dimension(:,:) :: A, B, C integer*4 l,m,n, stat, i,j, to_mb real*8 e, dt external dt ! write (*,99) 'Start test', dt() l = 5000000 m = 10 n = 100 ! to_mb = 1024*1024/kind(C) allocate ( C(l,n), stat=stat ) ; write (*,98) 'C allocated',stat, dt(), ' size C', size(C) / to_mb,' mb' allocate ( A(l,m), stat=stat ) ; write (*,98) 'A allocated',stat, dt(), ' size A', size(A) / to_mb,' mb' allocate ( B(m,n), stat=stat ) ; write (*,98) 'B allocated',stat, dt(), ' size B', size(B) / to_mb,' mb' 98 format (a,i5,f10.3,a,i6,a) ! A = 1 B = 2 ! call matrix_multiply (A, B, C, l,m,n) ! C = MatMul (A, B) write (*,99) 'Use intrinsic MATMUL function ', dt() 99 format (a,f10.3) ! e = 0 do j = 1,n do i = 1,l e = e + abs(c(i,j)-20.0d0) end do end do write (*,97) 'error = ',e, dt() 97 format (13x,a,es10.3,f10.3) end subroutine matrix_multiply (A, B, C, l,m,n) ! ! subroutine to multiply
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page