Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

gemm versus matmul

meow
Beginner
3,585 Views
When I use debug to compile my project, the code with gemm runs faster than matmul. However, when I choose release the code with gemm becomes extremely slow; does anyone know why? thanks
0 Kudos
3 Replies
TimP
Honored Contributor III
3,585 Views
The most obvious guess, in the absence of complete information, is that you used an optimized BLAS function along with your debug build. In that case, in-lined matmul would pick up a lot of performance with optimization, and could easily exceed the BLAS performance for those cases where matmul is well optimized. There may even be reasons for using double precision BLAS and comparing with single precision MATMUL.
If you used BLAS source code, it is possible that your cases may perform well without optimization, and, if you don't take precautions such as appropriate LOOP COUNT directives, or Profile Guided Optimization, it will optimize only for cases which you don't encounter, while in-line MATMUL code is able to optimize statically for the specific cases you want.
So, if you didn't want a long winded answer, you should have given more information.
0 Kudos
meow
Beginner
3,585 Views

thanks for your reply; I use dgemm provided by MKL; optimization parameters are those by defaul; the code looks like this:

call DGEMM('N','N',mmx,nnx,kkx,allp,tmyvariables_without_w2,lda,psi1prinv_without_w2,ldb,bett,mas,ldc)

with gemm

and

mas = matmul(tmyvariables_without_w2,psi1prinv_without_w2)

with matmil

what do I do to optimize the dgemm part?

0 Kudos
TimP
Honored Contributor III
3,585 Views
Nothing. That's the point, it is giving you full performance, regardless of your compile options. If the matrix size is less than about 15 (perhaps geometric mean of l,m,n if multiplying (l,m) by (m,n)) then MATMUL should optimize to better performance. If you used gfortran with its option to switch in and out of DGEMM according to the size, it would switch to DGEMM at default size 25, although that option isn't implemented for Windows, as far as I know.
0 Kudos
Reply