- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I was profiling performance of my application which is using MKL sgemm kernel on both windows and linux. My expectation was to get roughly same results, but sgemm kernel on windows is almost twice sequentially slower.
Since in my application I'm calling sgemm kernel multiple times, I tried to profile the sgemm kernel (source file attached) performance for three different sizes :
I'm using TBB threading layer, and test the sgemm kernel for windows and linux up to 16 threads.
compile commands, both using single dynamic library:
linux: g++ dgemm.cpp -L${MKLROOT}/lib/intel64 -O3 -Wl,--no-as-needed -lmkl_rt -lpthread -lm -ldl -o dgemm -m64 -I${MKLROOT}/include
windows: cl.exe dgemm.cpp /O2 /EHsc -o dgemm.exe mkl_rt.lib
CPU info :
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
CPU family: 6
Model name: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
A(20000x2000) and matrix B(2000x5000)
|
|||||||
threads | 1 | 2 | 4 | 6 | 8 | 12 | 16 |
windows | 3.13s | 1.85s | 1.36s | 1.18s | 1.17s | 1.21s | 1.18s |
linux | 3.04s | 1.64s | 1.01s | 0.77s | 0.83s | 0.82s | 0.82s |
A(1024x1024) and matrix B(1024x1024)
|
|||||||
threads | 1 | 2 | 4 | 6 | 8 | 12 | 16 |
windows | 0.0206s | 0.0106s | 0.0080s | 0.0087s | 0.0082s | 0.0078s | 0.0085s |
linux | 0.0212s | 0.0100s | 0.0063s | 0.0053s | 0.0054s | 0.0055s | 0.0052s |
A(128x128) and matrix B(128x128)
|
|||||||
threads | 1 | 2 | 4 | 6 | 8 | 12 | 16 |
windows | 0.000574s | 0.001169s | 0.000157s | 0.000505s | 0.000048s | 0.000029s | 0.000044s |
linux | 0.004134s | 0.001697s | 0.000052s | 0.000055s | 0.000017s | 0.000017s | 0.000093s |
While the issue I'm encountering in my application shows windows results are 2X slower, benchamkring sgemm kernel shows to me that scalability results differ on windows too.
My question is if there is any performance results comparing windows vs linux performance using different compilers and if the issue I'm encountering is expected or not.
Thank you very much,
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page