dgemm extremely slow (IFC 9.0 , on IA-32 Windows XP)

karl_walentin · ‎03-27-2008

I have a problem with VERY bad performance of dgemm. My actualcomputationregards matrices of size65*65, but the problem seems to be present for any matrix dimension above, roughly, 15. With Intel Fortran Compiler 9.0 dgemm is roughly 5 times slower than it should be. E.g. it takes 10 times longer to multiply two symmetric 65*65 matrices using dgemm than using dyr2k (which admittedly uses the symmetry of the matrices). These results where obtained using the release version of the compiler output. dgemm is also slower than

I link to the relevant "imsl" routines (because I use other routines than dgemm from the IMSL libraries) using the following statement in the code:

"INCLUDE 'link_f90_static.h"

This in turn means that I use the following libs: imsl.lib, imslscalar.lib, imslblas.lib, imsls_err.lib.

This performance problem occurs on all computers I have tried, single or multi-core. All of themare IA-32 are running Windows XP.

Any suggestions on how to get dgemm to perform matrix multiplications faster than this would be highly appreciated. Clearly I must be doing something very wrong (in terms of linking, optimizer setting or such), because I have a very hard time believing that the Intel Fortran compiler implementation of dgemm can be this bad.

Steven_L_Intel1 · ‎03-27-2008

It is not clear to me which implementation of DGEMM is "bad". Is it the one in IMSL? In a Fortran compiled source? Somewhere else? The compiler itself has NO implementation of DGEMM so you're getting it from some other place.

The best implementation of DGEMM would be in Intel MKL, included with Intel Visual Fortran Compiler Professional Edition 10.1. With 9.0 you'd have to buy MKL separately.

The IMSL implementation of BLAS in IMSL 5 (which you have) is not well optimized - in IMSL 6 (IVF 10.0/10.1) it uses MKL and is much better.

karl_walentin · ‎03-27-2008

Steve,

Sorry, I was a bit unclear. I mentioned that I use the IMSL files: imsl.lib, imslscalar.lib, imslblas.lib, imsls_err.lib. These were put in the directory "ProgramVNICTT6.0libIA32", most probably by the installation program.

If Iinstead ONLY link to the MKL files mkl_ia32.lib mkl_c.lib and libguide.lib (by writing "!dec$objcomment lib:..." in the source file, slightly inspired by one of your earlier posts in this forum) I do get a high performance dgemm routine. I just noticed this. It makes be happy. But, by remaining problem is then how to:

i) use functions available in IMSL but not in MKL ( dlftds dlinrt dlfdds dtrmm dtrmv )

AND, in the same program,

ii) use the high performance MKL routines for DGEMM, dsyr2k and dsymm

I tried to do this by putting both an "INCLUDE link_f90_static.h" statement and the above mentioned "!dec$ojbcomment..." statement in my source file, but this does not generate speed, although it works, so I assume that it simply disregards the MKL libs.

Regards,

Karl Walentin

Steven_L_Intel1 · ‎03-27-2008

All the "include 'link_f90_static.h'" line does is tell the linker to bring in the IMSL static single-threaded libraries. That can be enough to satisfy references to BLAS routines. One option for you is to call DGEMM_MKL95 and build the BLAS95 interfaces as supplied by MKL. Another, though I don't know for sure that this works, is to upgrade to IVF 10.1 with IMSL and use DGEMM from IMSL 6 which, I think, just uses MKL internally.

Yes, you get IMSL as part of the compiler product, but it is not the compiler that is generating the bad code.