- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to benchmark the intel mkl library function mkl_dcsrmv. I tried running the function for single, three,five and seven diagonals nnz entries and i repeated it multiple times with different values and different matrix size. The maximum efficiency I could achieve was 432MFLOPs which is 3%. The code snippet used to measure the time is attached below. I am running it on a 2Ghz, core2duo processor. The result sheet, in which the obtained time value and the MFLOP calculation is also attached. Can some one tell me if there is some bug in my code or is the performance of mkl not optimized for core2duo? I have checked my system configuration and it has SSE2 instruction support, which is the requirement mentioned in the manual.
Thanks for your help!
/******************************************************************************** * Content : Simple MKL Sparse Matrix-Vector Multiply in C * ********************************************************************************/ #include <stdio.h> #include <time.h> #include <stdlib.h> #define BILLION 1000000000L; #include "mkl.h" int main(int argc, char* argv[]) { int i, j; int m; int k; int nnz; double alpha; double beta; //double val[] = {1,2,4,5,6,5,8}; double* val; double* x; double* y; int* indx; int* pntrb; int* pntre; FILE *values; FILE* vector; FILE* index; FILE* pointerb; FILE* pointere; struct timespec start,end; /*printf("Enter the no of rows of the matrix, m="); scanf("%d",&m); //no of columns printf("Enter the no of columns of the matrix, k="); scanf("%d",&k); printf("Enter the no of non-zero elements of the matrix, nnz="); scanf("%d",&nnz);*/ if(argc != 4){ printf("Pass value of no of rows, no of columns and no of nonzero elements as arguments when you run the executable\n"); printf("Example: ./csr 1000 1000 3000 \n"); exit(-1); } m = atoi(argv[1]); k = atoi(argv[2]); nnz = atoi(argv[3]); //the condition if nnz = 0 is remaining //for (i = 0, i < nnz, i++) alpha = 1.0; beta = 0.0; //m = 1000000; //k = 1000000; //nnz = 1000000; val = (double*) malloc( sizeof(double)*(nnz)); y = (double*) malloc( sizeof(double)*(m)); indx = (int*) malloc( sizeof(int)*(nnz)); pntrb = (int*) malloc( sizeof(int)*(m)); pntre = (int*) malloc( sizeof(int)*(m)); x = (double*) malloc( sizeof(double)*(k)); i = 0; //Getting the column index index = fopen("index.txt","rb"); while(!feof(index)){ fscanf(index,"%d",&indx); i++; } fclose(index); i = 0; //Getting the beginning row pointer pointerb = fopen("ptrb.txt","rb"); while(!feof(pointerb)){ fscanf(pointerb,"%d",&pntrb); i++; } fclose(pointerb); i = 0; //Getting the end row pointer pointere = fopen("ptre.txt","rb"); while(!feof(pointere)){ fscanf(pointere,"%d",&pntre); i++; } fclose(pointere); i = 0; //Getting the vector values vector = fopen("x.txt","rb"); while(!feof(vector)){ fscanf(vector,"%lf",&x); i++; } fclose(vector); i = 0; //Getting the values of the matrix values = fopen("values.txt","rb"); while(!feof(values)){ //printf("point2, %d \n",i); fscanf(values,"%lf",&val); i++; } fclose(values); clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start); mkl_dcsrmv("N", &m, &k, &alpha, "G", val, indx, pntrb, pntre, x, &beta, y); clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end); double runtime = ( end.tv_sec - start.tv_sec )+ (double)( end.tv_nsec - start.tv_nsec )/ (double)BILLION; printf("Elapsed time = %g seconds\n", runtime); return 0; }
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should not always expect high efficiency for sparse matrix operations. There simply isn't enough computation to keep the CPU busy. DCSRMV performance largely depends on the nature of the sparse matrix. Please take a look at this chart: https://software.intel.com/sites/default/files/MKL-11-1-SPBLAS-CPU-1000.png. Although the chart shows DCSRMV performance on a different system, it gives you an idea how performance may vary depending on the type of input matrix. This chart used sparse matrices from the "University of Florida Sparse Matrix Collection". If you calculate the efficiency of some benchmarked matrices, it is not even 3%.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page