- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When mkl_?csrmv() is used with the option transa='N', the performance of mkl_?csrmv on Intel Xeon Phi coprocessor is good. However, when it is used with the option transa='T', its performance degrades dramatically.
What may be the reason behind this issue? May it be because of atomic writes to the output vector when there are lots of threads? (Number of threads is set via using mkl_set_num_threads())
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kadir,
As I understand it, transa has to do with how mkl multiplies the matrices. So my guess is that 'N' multiplies the matrices such that it vectorizes very well, whereas 'T' transposes the matrices and makes MKL's job much harder since it can't exploit various optimizations such as vectorization.
Regards
--
Taylor

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page