gemm versus matmul

meow · ‎11-28-2007

When I use debug to compile my project, the code with gemm runs faster than matmul. However, when I choose release the code with gemm becomes extremely slow; does anyone know why? thanks

TimP · ‎11-28-2007

The most obvious guess, in the absence of complete information, is that you used an optimized BLAS function along with your debug build. In that case, in-lined matmul would pick up a lot of performance with optimization, and could easily exceed the BLAS performance for those cases where matmul is well optimized. There may even be reasons for using double precision BLAS and comparing with single precision MATMUL.
If you used BLAS source code, it is possible that your cases may perform well without optimization, and, if you don't take precautions such as appropriate LOOP COUNT directives, or Profile Guided Optimization, it will optimize only for cases which you don't encounter, while in-line MATMUL code is able to optimize statically for the specific cases you want.
So, if you didn't want a long winded answer, you should have given more information.

meow · ‎11-28-2007

thanks for your reply; I use dgemm provided by MKL; optimization parameters are those by defaul; the code looks like this:

call DGEMM('N','N',mmx,nnx,kkx,allp,tmyvariables_without_w2,lda,psi1prinv_without_w2,ldb,bett,mas,ldc)

with gemm

and

mas = matmul(tmyvariables_without_w2,psi1prinv_without_w2)

with matmil

what do I do to optimize the dgemm part?

TimP · ‎11-28-2007

Nothing. That's the point, it is giving you full performance, regardless of your compile options. If the matrix size is less than about 15 (perhaps geometric mean of l,m,n if multiplying (l,m) by (m,n)) then MATMUL should optimize to better performance. If you used gfortran with its option to switch in and out of DGEMM according to the size, it would switch to DGEMM at default size 25, although that option isn't implemented for Windows, as far as I know.