Performance of MKL BLAS routines vs self compiled BLAS

Morag_A_Intel — Thu, 21 Jan 2016 08:49:39 GMT

I am using BLAS with my software, especially various GEMM & GEMV routines.

I have used Intel vTune to profile my software, and found out that using my own BLAS library (compiled with Intel Fortran Compiler) I get better performance (run-time) than using Intel MKL by 5-10%.

Does it make sense? Is it possible that taking BLAS sources from www.netlib.org/blas/ and compiling them myself will result in better optimized library than Intel MKL?

Regards,

Morag Agmon (Intel)

Morag,

Gennady_F_Intel — Thu, 21 Jan 2016 14:54:14 GMT

Morag,

that's not expected from our side. Where do you see 5-10% of MKL's performance gap? is that ?gemm routine? what is the problem size?

why do you use VTune ( did you use hotspot analys?) instead of directly measure execution time of these routines? What is CPU type you are running on?

topic Performance of MKL BLAS routines vs self compiled BLAS in Intel® oneAPI Math Kernel Library

Performance of MKL BLAS routines vs self compiled BLAS

Morag,