- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When I use debug to compile my project, the code with gemm runs faster than matmul. However, when I choose release the code with gemm becomes extremely slow; does anyone know why? thanks
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The most obvious guess, in the absence of complete information, is that you used an optimized BLAS function along with your debug build. In that case, in-lined matmul would pick up a lot of performance with optimization, and could easily exceed the BLAS performance for those cases where matmul is well optimized. There may even be reasons for using double precision BLAS and comparing with single precision MATMUL.
If you used BLAS source code, it is possible that your cases may perform well without optimization, and, if you don't take precautions such as appropriate LOOP COUNT directives, or Profile Guided Optimization, it will optimize only for cases which you don't encounter, while in-line MATMUL code is able to optimize statically for the specific cases you want.
So, if you didn't want a long winded answer, you should have given more information.
If you used BLAS source code, it is possible that your cases may perform well without optimization, and, if you don't take precautions such as appropriate LOOP COUNT directives, or Profile Guided Optimization, it will optimize only for cases which you don't encounter, while in-line MATMUL code is able to optimize statically for the specific cases you want.
So, if you didn't want a long winded answer, you should have given more information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks for your reply; I use dgemm provided by MKL; optimization parameters are those by defaul; the code looks like this:
call DGEMM('N','N',mmx,nnx,kkx,allp,tmyvariables_without_w2,lda,psi1prinv_without_w2,ldb,bett,mas,ldc) with gemmand
mas = matmul(tmyvariables_without_w2,psi1prinv_without_w2)
with matmil
what do I do to optimize the dgemm part?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nothing. That's the point, it is giving you full performance, regardless of your compile options. If the matrix size is less than about 15 (perhaps geometric mean of l,m,n if multiplying (l,m) by (m,n)) then MATMUL should optimize to better performance. If you used gfortran with its option to switch in and out of DGEMM according to the size, it would switch to DGEMM at default size 25, although that option isn't implemented for Windows, as far as I know.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page