I am a PhD student researching in the area of parallel programming. In my next research paper, I aim to present some high-performance (OpenCL) implementations for the Basic Linear Algebra Subroutines (BLAS) -- especially for the matrix multiplication routine GEMM -- on matrix sizes as used in the area of deep learning; my targeted hardware is Intel Xeon CPU. To strengthen my evaluation, I want to compare to the fastest state-of-the-art implementation for BLAS that targets Intel Xeon CPU.
My question is: Which is the currently fastest BLAS implementation for Intel Xeon CPU on matrix sizes as used in deep learning -- the Intel Math Kernel Library (MKL)?
Many thanks in advance.
What kind of BLAS function do you hope to evaluate? and we published some MKL blas result in official website, https://software.intel.com/en-us/mkl/features/benchmarks and you may refer to them and please let us know if any issues.
If talking about deep learning, we may recommend mkl-dnn, which may consider more optimize directly for operation like convolution etc.
thank you for your comment. I aim to evaluate SGEMM on dense matrices for input sizes as used in deep learning, for example:
- M=64, N=800, K=500
- M=64, N=2, K=10
I had a look at mkl-dnn and it seems to not provide a BLAS API. It does not provide a GEMM routine, right? Is the MKL the most appropriate library to evaluate GEMM for input sizes as listed above?
Hi Richard and all
There are several ways to evaluate this. but let's consider two of them which related to the forum
1. blas sgemm only
please refer to https://software.intel.com/en-us/articles/a-simple-example-to-measure-the-performance-of-an-intel-mkl-function
and the size is ok, you can refer to other MKL BLAS extension: https://insidehpc.com/2018/01/intel-mkl-speeds-small-matrix-matrix-multiplication-automatic-driving/
2. Deep learning and MKL-DNN
you may refer to the article : https://ai.intel.com/tensorflow-optimizations-intel-xeon-scalable-processor/
for your question, the MKL DNN https://github.com/intel/mkl-dnn is special for Deep learning and some operations, mainly the convolution and fully connect take BLAS API sgemm as underlying support functions. and there are far more optimization in deep learning, for example, https://software.intel.com/en-us/articles/introducing-dnn-primitives-in-intelr-mkl. the data is a little out of date, but see the performance figure, you can take the second bar as blas integration. the third bar as MKL-DNN integration, there are 2x performance difference.
So in general we recommend to consider MKL DNN for deep learning and you can consider such aspects also.