Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

MKL for Deep Learning?

Richard_S_7
Beginner
368 Views

Hello together,

I am a PhD student researching in the area of parallel programming. In my next research paper, I aim to present some high-performance (OpenCL) implementations for the Basic Linear Algebra Subroutines (BLAS) -- especially for the matrix multiplication routine GEMM -- on matrix sizes as used in the area of deep learning; my targeted hardware is Intel Xeon CPU. To strengthen my evaluation, I want to compare to the fastest state-of-the-art implementation for BLAS that targets Intel Xeon CPU.

My question is: Which is the currently fastest BLAS implementation for Intel Xeon CPU on matrix sizes as used in deep learning -- the Intel Math Kernel Library (MKL)?

Many thanks in advance.

Best,
Richard

0 Kudos
3 Replies
Ying_H_Intel
Employee
368 Views

Hi Richard,

What kind of BLAS function do you hope to evaluate?  and we published some MKL blas result in ​ official website, https://software.intel.com/en-us/mkl/features/benchmarks and you may refer to them and please let us know if any issues. 

​If talking about deep learning,  we may recommend mkl-dnn, which may consider more optimize directly for operation like convolution etc. 

GitHub - intel/mkl-dnn

https://github.com/intel/mkl-dnn


Best Regards,

​Ying 

 

0 Kudos
Richard_S_7
Beginner
368 Views

Hi Ying,

thank you for your comment. I aim to evaluate SGEMM on dense matrices for input sizes as used in deep learning, for example:

  • M=64, N=800, K=500
  • M=64, N=2, K=10

I had a look at mkl-dnn and it seems to not provide a BLAS API. It does not provide a GEMM routine, right? Is the MKL the most appropriate library to evaluate GEMM for input sizes as listed above?

Best,
Richard

0 Kudos
Ying_H_Intel
Employee
368 Views

Hi Richard and all 

There are several ways to evaluate this. but let's consider two of them  which related to the forum
​1. blas sgemm only
​please refer to  https://software.intel.com/en-us/articles/a-simple-example-to-measure-the-performance-of-an-intel-mkl-function
​and the size is ok, you can refer to other MKL BLAS extension:  https://insidehpc.com/2018/01/intel-mkl-speeds-small-matrix-matrix-multiplication-automatic-driving/

2. Deep learning and MKL-DNN
you may refer to  the article :  https://ai.intel.com/tensorflow-optimizations-intel-xeon-scalable-processor/

for your question,  the MKL DNN https://github.com/intel/mkl-dnn  is special for Deep learning and some operations, mainly the convolution and fully connect  take BLAS API sgemm as underlying support functions. and there are far more optimization in deep learning,   for example, https://software.intel.com/en-us/articles/introducing-dnn-primitives-in-intelr-mkl​. the data is a little out of date, but see the performance figure,  you can take the second bar as blas integration.  the third bar as MKL-DNN integration, there are 2x performance difference.

So in general we recommend to consider MKL DNN for deep learning and you can consider such aspects also. 

Best Regards,
​Ying

0 Kudos
Reply