optimal matrix multiplication ???

ekeom · ‎05-02-2011

Dear all,

Im trying to write the most optimal matrix by matrix multiplication Fortran routine, with OpenMP directives. Here is the most optimal code I get is below. Can someone tell me this really optimal, or if there is another solution?

Best regards, Didace

subroutine prod_mat (a,b,n)

use omp_lib

implicit none

integer, intent(in) :: n

integer :: i, iproc, j, nproc

double precision, dimension(n,n), intent(inout) :: a

double precision, dimension(n,n), intent(in ) :: b

double precision, dimension(:,:), allocatable :: v

! get number of available threads

nproc = omp_get_max_threads()

!

! allocate array v

allocate(v(n,0:nproc))

! product = x

!$omp parallel do default(shared) schedule(dynamic,nproc) private(i,iproc,j)

do i=1,n

iproc = omp_get_thread_num()

v(:,iproc) = a(i,:)

do j=1,n

a(i,j) = sum(v(:,iproc)*b(:,j))

enddo

!$omp end parallel do

! free memory of array v

deallocate(v)

end subroutine prod_mat

TimP · ‎05-02-2011

Did you compare with available implementations, such as dgemm, Atlas, MKL, matmul as implemented by your compiler, ....? People still spend months optimizing implementations such as MKL for each new architecture variation, over a range of problem shapes and sizes. Usual goals include, but aren't limited to, giving each thread exclusive access to each cache line, and minimizing the rate at which the inner loop accesses cache lines.
If you could find a compiler which implements OpenMP collapse effectively, it would be nearly trivial to get good results in a useful range by applying OpenMP to standard dgemm.f. If your idea of optimality includes choosing the one most efficiently implemented matrix size, you won't even need the collapse.

jimdempseyatthecove · ‎05-02-2011

I can attest that you will be hard pressed to produce code that performs multi-threadedmatrix multiplication more efficiently than that provided in MKL dgemm (excepting for very small sized matrix or so large that it requires MPI). An additional benefit of using an optimized library is someone else will update the library to account for changes in platform (e.g. migrating from SSE to AVX).

Jim Dempsey

Steven_L_Intel1 · ‎05-02-2011

There is even an option /Qopt-matmul in compiler version 12 (Composer XE 2011) to automatically call MKL for MATMUL references.