I am using dgemm function from Math Kernel Library in my FORTRAN source code to do the following matrix calculation:
X = A - transpose(B) *C
A is 400*400 dense matrix, B is 10000*400 sparse matrix and C is 10000*400 matrix.
CALL dgemm('T', 'N', 400, 400, 10000, -1.d0, . B, 10000, C, 10000, 1.d0, A, 400)
This operation takes about 3.5 seconds! which is a lot in my program! 1.Is 3.5 seconds a reasonable amount of time for this operation? 2. Is there any way to speed up the process?
I am using a dell computer which has Intel Core 2 Duo CPU and my MKL version is 10.0.1 which is relatively old. My last question is if I switch to the most recent MKL can I see significant improvement in the performance for matrix multiplication?
A few suggestions:
Hi Zhang Z,
Thanks for your reply and good suggestions.
I tried both sequential and parallel forms. But I could not see significant improvement in performance.
I also change number of threads from default which is maximum number of threads possible to 1 thread. But nothing significant happened!
I store B as full-storage format! I am definitely going to try the sparse multiply function.
I found out that the speed is a direct function of number of columns of C matrix (I changed number of rows too but it was not really major factor). With number of columns equal to 11000 I get the following results:
Number of rows = 363, Time=3.34 seconds
Number of rows = 120 Time=0.3 seconds
Number of rows = 30 Time=0.016 seconds
Would you please explain me why timing is so different based on number of rows? Does it make sense for you? Is the new version of MKL improved in this part?