Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7263 Discussions

Poor performance of intel MKL (Intel One API 2024.2 or 2025.3) VS GSL

Olórin
Novice
638 Views

Prerequisites :

  • In what follows double precision is always used
  • If A is a matrix of dimensions (N,M) and B is a matrix of dimension (M,K) the product is a matrix of dimension (N,M) : the naïve way to compute the product requires the computation of N times K coefficients and each of these coefficient is a scalar product of vectors or size M requiring M double products and M-1 double sums, that is M + M-1 = 2M-1 flops. Hence, computing the product requires N*K*(2M-1) flops
  • I will only deal with square matrices in what follows, of size N (for various size_t values orf N), hence one product requires (2n-1)*N^2 flops
  • The performance of MKL and GSL are measured in megaflops per seconds (1MFLOP = 10^6 flops) (duration are measure, for precision, in nanoseconds) Please note that I also measured perf in pure duration in nanoseconds and that the conclusion (bad for the MKL) remain the same. (I precise that to cover the argument consisting in saying that MKL or GSL internal cuisines for the product maybe do not require (2n-1)*N^2 flops but are optimized etc : comparing pure duration nullifies such argument)

 

I code in C++ (even if MKL semantics are C semantics), the GSL has been installed through the following nuget package :

https://www.nuget.org/packages/gsl-msvc-x64 

 

The two versions of the MLK are :

  • 2024.2, installed by installing the "full" one api version 2024.2
  • 2025.3, installed by installing the one api MKL standalone package, version 2025.3

 

I used  two C++ compilers :

  • the Microsoft compiler from Microsoft Visual Studio Professional 2022, Version 17.14.16
  • the Intel C++ Compiler 2024 from intel one API 2024.2

 

Everything is compiled in release configuration (in debug, for the intel compiler, there is an error : I don't care about that, I am not here to investigate intel C++ compilers shortcomings except the performance of the code it produces) for the x64 platform. I attach a zip of the visual studio solution folder so that you can test. All configuration properties are in it if you have questions about them. You can tweak them as you want. I use MKL in sequential.

 

Now let me comment my test of performance code :

  • my_gsl.h and my_gsl.cpp is a wrapper over GSL's cblas_dgemm
  • my_mkl.h and my_mkl.cpp is a wrapper over MKL's cblas_dgemm (why these wrapper : because C doesn't have namespaces and GSL guy's and MKL's guys did not bother writing C++ wrappers themselves for their clients to avoid name clashes)
  • The perf test code is in test_gsl_vs_mkl.cpp, I comment it briefly below.

It starts with two simplification wrappers

void GSL(double* a, double* b, double* c, size_t N)

and

void MKL(double* a, double* b, double* c, size_t N)

If sets up several squared matrices sizes 1, 10, 20, 30, 40, 50, 70, 80, 90,
100, 200, 300, 400, 500, 600, 700, 800, 900,
1000, 2000, 3000, 4000, 5000.

For each size it allocates two matrices (a and b) whose product should be computed, and a result matrix c filled with 0.0 that will contain the results of dgemm.

The matrix are once for all filled with random values (thanks to vdRngUniform and VSL_RNG_METHOD_UNIFORM_STD) so that I am sure that they are sufficiently random and even more important, really not sparse. (In my use case they are never sparse.)

The I do two loops (one for GSL and one for MKL) of 50 iterations where on iteration I don't time while on the second I don't time (for the sake of potential first initialization cost of whatever, as I never know, and I average (on 25 iteration)

The only things that are timed are calls to dgemm. I don't time memory allocation or deletion, random number generations.

The console prints out the results.

 

The results are almost the same at all matrix sizes, for GSL and MKL, for both compilers, and for both versions of the MKL (2024.2 or 2025.3).

 

If I do something wrong (project/MKL configuration etc), please tell me.

If not, please explain to me what is happening. (Why doesn't the MKL outperform the GSL, why compiling with Intel C/C++ compiler doesn't almost change anything.)

 

 

 

 

 

0 Kudos
1 Reply
Olórin
Novice
533 Views

Numerical results :

 

Screenshot 2025-11-28 233644.jpg

0 Kudos
Reply