Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
6743 Discussions

MKL Performance issue in threaded application



We are working on RNN kernel optimization and we are trying to parallel 2 SGEMM on 2 socket SKX6148 server( 20 core per socket).

The SGEMM size is M = 20, N = 2400, K = 800.

Our target is to map the first SGEMM to socket0 and the other SGEMM to socket1.

We measured the GFLOPS with this benchmark(, and got the following performance data:

I found that the performance of OMP+MKL or TBB MKL is not as good as we expect, and i'm not sure if i miss something with MKL in threaded application.

BTW, the pthread+MKL solution is not suitable for our real case , as it will double the threads and make the performance even worse.

Thanks in advance.

0 Kudos
0 Replies