I am trying to use MKL to do some matrix multiplications. However I need to multiply specific columns in one matrix to another matrix. The problem is that cblac_dgemm takes successive columns, so how can I do this and only multiply certain columns knowing the starting pointer of each of course ?? I tried using a for loop to multiply each column by the matrix but the execution time is much larger than taking successive columns all at once I also don't want to copy the columns to a new variable to but them in a successive order because this also takes long time.
I tried using pthreads to do it in a more paralleled way however still no much improvement.
How does MKL handle parallelization and make it much better than OpenMP and pthreads?
Thanks in advance
I don't know if you want to do dot product or element-wise multiplication. And I don't know whether you want to use C or FORTRAN. But you can start with the following.
cblas_?dot for the vector-vector dot procuct in C - where the '?' is s, d, c, or z for single, double, complex single, or complex double
v?Mul for vector-vector element-wise multiplication, i.e.
call vsmul( n, a, b, y ), call vdmul( n, a, b, y ), call vcmul( n, a, b, y ), call vzmul( n, a, b, y )
vsMul( n, a, b, y );, vdMul( n, a, b, y );, vcMul( n, a, b, y );, vzMul( n, a, b, y );
As for how MKL optimizes parallelization - we have teams of software developers who like to play with math. They try out different algorithms and use math tricks (like reorganizing the solution process) and try out their solutions on different chips and with different sized problems to determine which solutions fit which problem sets.
I hope this helps.
Thank you Pamela for your response I truly appreciate it.
Answering your questions, I use C++ and I am trying to do very fast matrix multiplication (not element-wise). I tried all the functions you mentioned my problem is that I want to multiply certain columns in the first matrix (not the whole matrix) to the second one. but using a for loop over the functions you mentioned is much slower than cblas_dgemm.
My question, Is there a way to use cblas_dgemm but for specific columns in the first matrix not the whole matrix because copying them to another position in the memory where they exist after each other is also time consuming.
Thank you very much
If I understand what you mean, you can try to use batch gemm, see
where the general concept is described here:
Hope, this helps!