Intel® oneAPI Base Toolkit
Support for core tools and libraries to build and deploy high-performance data-centric applications
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
For the latest information on Intel’s response to the Log4j/Log4Shell vulnerability, please see Intel-SA-00646
255 Discussions

MKL sgemm peformance windows vs linux


Hi, I was profiling performance of my application which is using MKL sgemm kernel on both windows and linux. My expectation was to get roughly same results, but sgemm kernel on windows is almost twice sequentially slower.

Since in my application I'm calling sgemm kernel multiple times, I tried to profile the sgemm kernel (source file attached) performance for three different sizes : 

I'm using TBB threading layer, and test the sgemm kernel for windows and linux up to 16 threads. 

compile commands, both using single dynamic library:

linux: g++ dgemm.cpp -L${MKLROOT}/lib/intel64 -O3 -Wl,--no-as-needed -lmkl_rt -lpthread -lm -ldl -o dgemm -m64 -I${MKLROOT}/include

windows: cl.exe dgemm.cpp /O2 /EHsc -o dgemm.exe mkl_rt.lib 

CPU info
Architecture: x86_64 CPU op-mode(s):  32-bit, 64-bit  CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
CPU family: 6
Model name: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz



A(20000x2000) and matrix B(2000x5000)
threads 1 2 4 6 8 12 16
windows 3.13s 1.85s 1.36s 1.18s 1.17s 1.21s 1.18s
linux 3.04s 1.64s 1.01s 0.77s 0.83s 0.82s 0.82s
A(1024x1024) and matrix B(1024x1024)
threads 1 2 4 6 8 12 16
windows 0.0206s 0.0106s 0.0080s 0.0087s 0.0082s 0.0078s 0.0085s
linux 0.0212s 0.0100s 0.0063s 0.0053s 0.0054s 0.0055s 0.0052s
A(128x128) and matrix B(128x128)
threads 1 2 4 6 8 12 16
windows 0.000574s 0.001169s 0.000157s 0.000505s 0.000048s 0.000029s 0.000044s
linux 0.004134s 0.001697s 0.000052s 0.000055s 0.000017s 0.000017s 0.000093s



While the issue I'm encountering in my application shows windows results are 2X slower, benchamkring sgemm kernel shows to me that scalability results differ on windows too. 

My question is if there is any performance results comparing windows vs linux performance using different compilers and if the issue I'm encountering is expected or not. 

Thank you very much,

0 Kudos
1 Reply

sorry I forgot to attached the source file.