why is cblas_sgemm 5 times slower than cblas_dgemm

missspicyfood — Wed, 12 Dec 2007 15:01:10 GMT

hi, all,

i am experiencing a weird problem using MKL 10.0.2 under Visual Studio 2005/2008 express edition.

So, i am trying to use cblas_sgemm/dgemm to do a matrix multiplication as follows:

Matrix A (m*n), where m is around 50000, n is around 50.
Matrix B (m*n).
matrix C (n*n)

i need to do C=A->transpose * B

so i wrote
cblas_sgemm(CblasColMajor, CblasTrans, CblasNoTrans, n, n, m, 1.0f, A, m, B, m, 0.0f, C, n);

and the same with double precision

cblas_dgemm(CblasColMajor, CblasTrans, CblasNoTrans, n, n, m, 1.0f, A, m, B, m, 0.0f, C, n);

basically they both work in terms of giving the right output as desired. However, when I use the sgemm with ABC as float*, it runs 5 times slower than using dgemm with ABC as double*..

could anybody help check this out????? thank you very very much !!!!!

topic why is cblas_sgemm 5 times slower than cblas_dgemm in Intel® oneAPI Math Kernel Library

why is cblas_sgemm 5 times slower than cblas_dgemm