Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Intel MKL performance question

Beginner
642 Views
Hello!
I need to solve a very big linear system and I was thinking about using the MKL library.
So just to get started I decided to make practice with a very simple problem just to get the feel. This is the
source code:
```[fortran]program Main

! Equazione di Laplace su un dominio quadrato di lato L
! L*L equazioni lineari in altrettante incognite
! Matrice simmetrica sparsa

implicit none
integer,parameter:: L = 100
integer,parameter	:: N = L*L

real(8)				:: A(N,N),Mat(N,N)
real(8)				:: b(N),x(N)
integer				:: i,j
integer				:: IPIV(N)
integer				:: INFO
real				:: start,end

integer :: k, row, col, ind
real(8),parameter :: H = 3
real(8)	:: bndrow0(L), bndrowL1(L), bndcol0(L), bndcolL1(L)
real(8) :: val,y(N),ALPHA,BETA,maxerr,minerr

! Condizioni al contorno
do k=1,L
val = H * (k - 1) / (L - 1)
bndrow0(k) = val
bndrowL1(L-k+1) = val
bndcol0(k) = val
bndcolL1(L-k+1) =	val
enddo

! Inizializzo il termine noto
b(indx(1, 1)) = - bndrow0(1) - bndcol0(1)
do col=2,L-1
b(indx(1,col)) = - bndrow0(col)
enddo
b(indx(1,L)) = -bndrow0(L) - bndcolL1(1)
do row=2,L-1
b(indx(row,1)) = - bndcol0(row)
do col=2,L-1
b(indx(row,col)) = 0
enddo
b(indx(row,L)) = - bndcolL1(row)
enddo
b(indx(L,1)) = - bndrowL1(1) - bndcol0(L)
do col=2,L-1
b(indx(L,col)) = - bndrowL1(col)
enddo
b(indx(L,L)) = - bndrowL1(L) - bndcolL1(L)

! Inizializzo la matrice
row = 1
col = 1
ind = indx(row, col)
A(ind, ind)= -4
A(ind, indx(row+1,col)) = 1
A(ind, indx(row,col+1)) = 1

do col=2,L-1
ind = indx(row,col)
A(ind,ind) = -4
A(ind,indx(row,col-1)) = 1
A(ind,indx(row+1,col)) = 1
A(ind,indx(row,col+1)) = 1
enddo

col = L
ind = indx(row,col)
A(ind,ind) = -4
A(ind,indx(row,col-1)) = 1
A(ind,indx(row+1,col)) = 1

do row=2,L-1
col = 1
ind = indx(row,col)
A(ind,ind) = -4
A(ind,indx(row-1,col)) = 1
A(ind,indx(row+1,col)) = 1
A(ind,indx(row,col+1)) = 1
do col=2,L-1
ind = indx(row,col)
A(ind,ind) = -4
A(ind,indx(row-1,col)) = 1
A(ind,indx(row,col-1)) = 1
A(ind,indx(row+1,col)) = 1
A(ind,indx(row,col+1)) = 1
enddo
col = L
ind = indx(row,col)
A(ind,ind) = -4
A(ind,indx(row-1,col)) = 1
A(ind,indx(row,col-1)) = 1
A(ind,indx(row+1,col)) = 1
enddo

row = L
col = 1
ind = indx(row,col)
A(ind,ind) = -4
A(ind,indx(row,col+1)) = 1
A(ind,indx(row-1,col)) = 1

do col=2,L-1
ind = indx(row,col)
A(ind,ind) = -4
A(ind,indx(row,col+1)) = 1
A(ind,indx(row-1,col)) = 1
A(ind,indx(row,col-1)) = 1
enddo

col = L
ind = indx(row,col)
A(ind,ind) = -4
A(ind, indx(row-1,col)) = 1
A(ind,indx(row,col-1)) = 1

Mat = A
x = b

write (*,*) 'Chiamo DGESV'
call CPU_TIME(start)
call DGESV(N,1,Mat,N,IPIV,x,N,INFO)
call CPU_TIME(end)
write(*,*) INFO			! Verifica correttezza
write(*,*) end-start	! Tempo passato in secondi

y=b
ALPHA=1
BETA=-1
call DGEMV('n',N,N,ALPHA,A,N,x,1,BETA,y,1)
maxerr=maxval(y)
minerr=minval(y)
write (*,*) 'Errori:',minerr,maxerr

!	do row=1,L
!		do col=1,L
!		 write (*,*) row,col,x(indx(row,col))
!		enddo
!	enddo

contains
integer function indx(r, c)
! Fornisce l'indice della riga di A corrispondente all'equazione nel punto
! r, c del dominio, che  anche l'indice della colonna di A corrispondente
! al valore della funzione in quel punto.
integer::r, c
indx = (r - 1) * L + c
end function indx

end program Main[/fortran]```
So if I compile using the MKL library(I used the online tool to generate the options) i get this result:
```[bash]MacBook-Pro:Prova Luca\$ ifort Main.f90 /opt/intel/composerxe-201libmkl_intel_lp64.a /opt/intel/composerxe-2011.3.167/mkl/lib/libmkl_sequential.a /opt/intel/composerxe-2011.3.167/mkl/lib/libmkl_core.a -mmacosx-version-min=10.6
MacBook-Pro:Prova Luca\$ ./a.out
Chiamo DGESV
0
62.53065
Errori: -1.110223024625157E-014  1.154631945610163E-014[/bash]```
while if I compile using the BLAS/LAPACK routines taken from the netlib site(I put them in the BLAS.f and LAPACK.f files) I get this:
```[bash]MacBook-Pro:Prova Luca\$ ifort Main.f90 BLAS.f LAPACK.f -mmacosx-version-min=10.6
MacBook-Pro:Prova Luca\$ ./a.out
Chiamo DGESV
0
5.780609
Errori: -1.376676550535194E-014  1.554312234475219E-014[/bash]```
which seems to me a huge difference in execution speed(62 vs 6 seconds).
So I am doing something wrong here?
If it helps I am using Mac OS X 10.6.7 with the compiler/MKL update 3.
Cheers!
Luca
5 Replies
Moderator
642 Views
Luca, all are ok at the first glance. can give these files (I mean blas.f & lapack.f) to reproduce the problem?
Beginner
642 Views
Hello!
Just noticed that there is a blunder in the command I posted when linking against the MKL library, the correct command that I'm using is:
`[bash]ifort Main.f90 /opt/intel/composerxe-2011.3.167/mkl/lib/libmkl_intel_lp64.a /opt/intel/composerxe-2011.3.167/mkl/lib/libmkl_sequential.a /opt/intel/composerxe-2011.3.167/mkl/lib/libmkl_core.a -mmacosx-version-min=10.6[/bash]`
otherwise it returns an error.
I'm attaching to this post the two requested files(as I said I've simply taken the necessary routines/functions from the netlib site).
Cheers!
Luca
Employee
642 Views

Luca,

Comparison of MKL and netlib versions of codes is not fair and below Ill try to explain it. Shortly, despite the codes coincide by functionality and interface but they have different features.

You use DGESV (solver for system of linear algebraic equations) for dense matrices although your matrix is sparse.

Number of FP operations (num_op) of DGESV is equal 2/3*N^3 + N^2 ( N dimension of matrix) if number of right hand sidesis equal to 1.

Formula to compute Performance in Gflops:

Gflops = num_op/time(sec)/10^9;

It is easy to compute MKL Performance in Gflops for your matrix ( N= 10000, time - 62.53065 ) - 10.66 Gflops. It is nice Performance.

Netlib spends less time for solving your system because Netlib skipsmultiplication operations with ZEROs and as a result Netlib executes less FP operations. But MKL executes all multiplication operations including ZEROs.

If you add perturbation instead of ZEROs in your matrix you will able to get real Netlib Performance for dense matrices.

For sparse matrices you can use MKL PARDISO. If you need to solve Laplace Equation with boundary conditions you can use also Poisson Solver from MKLwhich is designed exactly to such problems.

P.S.

Code below from your BLAS.f skips multiplication operations with ZEROs:

DO 160 L = 1,K

IF (B(J,L).NE.ZERO) THEN

TEMP = ALPHA*B(J,L)

DO 150 I = 1,M

C(I,J) = C(I,J) + TEMP*A(I,L)

150 CONTINUE

END IF

Black Belt
642 Views
It is worth noting that the lines of code that are shown in #3 occur in the heavily used DGEMM subroutine.
Beginner
642 Views
Hello!
Ah nice catch, indeed if I "perturb" the matrix the performance of the netlib versions drops significantly(while the the performance of the MKL versions remains basically the same):
```[bash]MacBook-Pro:Desktop Luca\$ ifort Main.f90 /opt/intel/composerxe-2011.4.184/mkl/lib/libmkl_intel_lp64.a /opt/intel/composerxe-2011.4.184/mkl/lib/libmkl_sequential.a /opt/intel/composerxe-2011.4.184/mkl/lib/libmkl_core.a
MacBook-Pro:Desktop Luca\$ ./a.out
Chiamo DGESV
0
61.69716
Errori: -1.209111109845562E-011  1.323935752695071E-012
MacBook-Pro:Desktop Luca\$ rm a.out
MacBook-Pro:Desktop Luca\$ ifort Main.f90 BLAS.f LAPACK.f
MacBook-Pro:Desktop Luca\$ ./a.out
Chiamo DGESV
0
286.2718
Errori: -9.265057471230520E-012  7.393842482716906E-013[/bash]```
Cheers!

Luca