Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

optimal matrix multiplication ???

ekeom
Novice
604 Views
Dear all,

Im trying to write the most optimal matrix by matrix multiplication Fortran routine, with OpenMP directives. Here is the most optimal code I get is below. Can someone tell me this really optimal, or if there is another solution?

Best regards, Didace


subroutine prod_mat (a,b,n)

use omp_lib

implicit none

integer, intent(in) :: n

integer :: i, iproc, j, nproc

double precision, dimension(n,n), intent(inout) :: a

double precision, dimension(n,n), intent(in ) :: b

double precision, dimension(:,:), allocatable :: v

! get number of available threads

nproc = omp_get_max_threads()

!

! allocate array v

allocate(v(n,0:nproc))

! product = x

!$omp parallel do default(shared) schedule(dynamic,nproc) private(i,iproc,j)

do i=1,n

iproc = omp_get_thread_num()

v(:,iproc) = a(i,:)

do j=1,n

a(i,j) = sum(v(:,iproc)*b(:,j))

enddo

enddo

!$omp end parallel do

! free memory of array v

deallocate(v)

end subroutine prod_mat

0 Kudos
3 Replies
TimP
Honored Contributor III
604 Views
Did you compare with available implementations, such as dgemm, Atlas, MKL, matmul as implemented by your compiler, ....? People still spend months optimizing implementations such as MKL for each new architecture variation, over a range of problem shapes and sizes. Usual goals include, but aren't limited to, giving each thread exclusive access to each cache line, and minimizing the rate at which the inner loop accesses cache lines.
If you could find a compiler which implements OpenMP collapse effectively, it would be nearly trivial to get good results in a useful range by applying OpenMP to standard dgemm.f. If your idea of optimality includes choosing the one most efficiently implemented matrix size, you won't even need the collapse.
0 Kudos
jimdempseyatthecove
Honored Contributor III
604 Views

I can attest that you will be hard pressed to produce code that performs multi-threadedmatrix multiplication more efficiently than that provided in MKL dgemm (excepting for very small sized matrix or so large that it requires MPI). An additional benefit of using an optimized library is someone else will update the library to account for changes in platform (e.g. migrating from SSE to AVX).

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
604 Views
There is even an option /Qopt-matmul in compiler version 12 (Composer XE 2011) to automatically call MKL for MATMUL references.
0 Kudos
Reply