- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I just noticed a potential performance bug in the DGEMM implementation of MKL (16.0.1)
when using a single thread. I merely want to make someone at Intel aware of it, in case it is of interest.
Strangely DGEMM performs better for beta=1 than for beta=0 in certain
situations. Here is an example:
Intel(R) Xeon(R) CPU E5-2650:
m=72, n=373248, k=72, beta=0.00 : 14.25 GF
m=72, n=373248, k=72, beta=1.00 : 18.36 GF
Intel(R) Xeon(R) CPU E5-2650:
m=72, n=373248, k=72, beta=0.00 : 19.25 GF
m=72, n=373248, k=72, beta=1.00 : 28.34 GF
As you can see, the performance difference is significant. It is
actually so significant that it pays off to set C to zero explicitly
before calling MKL and then using the more efficient beta=1
implementation instead.
Here is a quick test driver:
#include <stdio.h> #include <stdlib.h> #include <omp.h> extern "C" int dgemm_(char *transa, char *transb, int *m, int * n, int *k, double *alpha, double *a, int *lda, double *b, int *ldb, double *beta, double *c, int *ldc); void trashCache(float* trash1, float* trash2, int nTotal){ for(int i = 0; i < nTotal; i ++) trash1 += 0.99 * trash2; } int main(int argc, char ** argv) { if(argc < 2 ){ printf("Usage: <beta>\n"); exit(-1); } float *trash1, *trash2; int nTotal = 1024*1024*100; trash1 = (float*) malloc(sizeof(float)*nTotal); trash2 = (float*) malloc(sizeof(float)*nTotal); int m = 72; int n = 72*72*72; int k = 72; double flops = 2.E-9 * m*n*k; double alpha=1; double beta=atof(argv[1]); double *A, *B, *C; int ret = posix_memalign((void**) &A, 64, sizeof(double) * m*k); ret += posix_memalign((void**) &B, 64, sizeof(double) * n*k); ret += posix_memalign((void**) &C, 64, sizeof(double) * m*n); double minTime = 1e100; for (int i=0; i<3; i++){ trashCache(trash1, trash2, nTotal); double t = omp_get_wtime(); dgemm_("T", "N", &m, &n, &k, &alpha, A, &m, B, &k, &beta, C, &m); t = omp_get_wtime() - t; minTime = (minTime < t) ? minTime : t; } printf("m=%d, n=%d, k=%d, beta=%.2f : %.2lf GF\n", m,n,k,beta,flops/minTime); free(A); free(B); free(C); free(trash1); free(trash2); return 0; }
Best, Paul
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks Paul, we will check this on our side.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Paul, we confirm the problem with this case. The issue is escalated and we will let you know when the problem would be resolved. Thanks for the case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Paul, please check is the problem still exists with the latest 11.3 update 4 version of MKL and let us know the results. thanks, Gennady
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page