- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Experts,
I am trying to do SpMV using the diagonal storage format. I found 2 routines that do this operation for real double-precision one-based Indexing (mkl_ddiamv, mkl_ddiagemv).
I get the right results. But the issue is that even when I change the number of threads I get almost the same GFLOPS/s (i.e. same execution time).
I checked the results on KNL (64-Core) and Dual-E5 Broadwell (72 cores ) and used 26 diagonal matrices from University of Florida (example: McRae/ecology1),
For the McRae/ecology1 , On E5 : around 2 GFLOPS using (1, 4, 8, 18, 36, 54, 72) threads.
For the McRae/ecology1, On KNL: around 0.9 GFLOPS using (1, 4, 16, 32, 64, 128, 192, 256) threads.
Note: I used CSR, BSR storage formats, the GFLOPS/s changes at different thread number.
This is a part of my test code:
double exTime = 0.0, stime = 0, etime = 0.0; if(nr == nc) //mxm { mkl_ddiagemv(&transa, &dia->m , dia->val , &dia->lval , dia->idiag , &dia->ndiag , x , y); bool resultNotRight = IsResultsWrong(y, y_ref, nr); if(resultNotRight) return -5; for (int i = 0; i < (runs); i++) { stime = dsecnd(); mkl_ddiagemv(&transa, &dia->m , dia->val , &dia->lval , dia->idiag , &dia->ndiag , x , y); etime = dsecnd(); runResults = (etime - stime); } } else //mxk { mkl_ddiamv (&transa,&nr,&nc,&alpha,matdescra,dia->val, &dia->lval, dia->idiag ,&dia->ndiag ,x,&beta,y); bool resultNotRight = IsResultsWrong(y, y_ref, nr); if(resultNotRight) return -5; for(int i=0;i< (runs);i++) { stime=dsecnd(); mkl_ddiamv (&transa,&nr,&nc,&alpha,matdescra,dia->val, &dia->lval, dia->idiag ,&dia->ndiag ,x,&beta,y); etime= dsecnd(); runResults = (etime - stime); } } //Calculate Best Execution Time bestExTime = GetMaxExcutionTime(runResults); //Print GPLOPS bestExTime = bestExTime / (double)runs; double gplops = 1.e-9 * (2.0 * nnz /bestExTime); cout<<gplops<<"," << dia->ndiag << ",";
Thanks,
Mohammad Almasri
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you see the same performance because of these routines ( for diagonal format ) are not threaded.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page