Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
26 Views

mkl_?imatcopy single-threaded?

Jump to solution
It would appear this, and maybe it's the nature of the problem, only appears to use one thread to transpose the matrix. Now, that may be because it's not possible to do it in a threaded fashion. The out-of-place version also appears to only run on one thread - which I would think could be threaded. Grab a row in the source, make it a column in the target, and at the least have devy up row to threads. So, this makes me think I'm doing something else wrong.
Am I doing something wrong in setup, or is the function truly only sequential? I'm linking against mkl_intel_thread.lib. My compiler is MSVC 2008.
Thanks!
0 Kudos

Accepted Solutions
Highlighted
New Contributor III
26 Views
Hi Randy,

Threading in mkl_?imatcopy (for certain square sizes) will be available in one of our nearest update releases.

As to the out-of-place transpose, at MKL's level it is not obvious how a user expects the data in both input and output matrices to be distributed across threads (cores). On the other hand, using existing mkl_?omatcopy in a parallel section seems pretty straight-forward (similar to what you described above):

#pragma omp parallel
{
// A - input matrix in row-major layout
// B - output matrix in row-major layout
// rows - number of rows in A
// cols - number of cols
// t_id - omp thread id
// n_threads - number of omp threads
... // user code which works with a part of A from the row my_start and consisting of my_part rows
mkl_?omatcopy('R', 'T', my_part, cols, A + my_start*lda, lda, B + my_start, ldb);
... // user code which works with B
}

Best regards,
-Vladimir

View solution in original post

0 Kudos
2 Replies
Highlighted
New Contributor III
27 Views
Hi Randy,

Threading in mkl_?imatcopy (for certain square sizes) will be available in one of our nearest update releases.

As to the out-of-place transpose, at MKL's level it is not obvious how a user expects the data in both input and output matrices to be distributed across threads (cores). On the other hand, using existing mkl_?omatcopy in a parallel section seems pretty straight-forward (similar to what you described above):

#pragma omp parallel
{
// A - input matrix in row-major layout
// B - output matrix in row-major layout
// rows - number of rows in A
// cols - number of cols
// t_id - omp thread id
// n_threads - number of omp threads
... // user code which works with a part of A from the row my_start and consisting of my_part rows
mkl_?omatcopy('R', 'T', my_part, cols, A + my_start*lda, lda, B + my_start, ldb);
... // user code which works with B
}

Best regards,
-Vladimir

View solution in original post

0 Kudos
Highlighted
Beginner
26 Views
Vladimir,
I can't ask for more than that! That was what I did, actually - well using TBB - for an out of place parallel algorithm.
That's a wonderful answer about the inplace receiving a threaded update. I'll leave my out of place parallel in for now, and then I'll gladly switch it when that update arrives.
I appreciate your response and your time.
Thanks!
0 Kudos