Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- Non-square Matrix Transpose

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Ioan_Hadade

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-24-2015
03:40 AM

66 Views

Non-square Matrix Transpose

Hi guys,

Are there any highly optimized MKL routines or maybe performance primitives that can do rectangle matrix transposition but without scaling?

I've been using mkl_omatcopy but it seems to perform worse than a normal baseline implementation and I suspect this is due to the additional scaling that is performed. I've attached a plot running a naive baseline implementation with comparison on omatcopy and imatcopy. The latter I know runs very poorly on non-square matrices.

I just want to know whether I should start spending some time optimizing my own transpose routine with AVX/AVX2 and blocking or whether there's a very efficient one out there already.

Also, swapping indices is not viable for what I am trying to achieve.

Thank you in advance!

Ioan

Link Copied

2 Replies

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-26-2015
03:35 AM

66 Views

Ioan, Could you give us M x N sizes instead of the # of elements?

Ioan_Hadade

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-26-2015
03:50 AM

66 Views

Hi Gennady,

Thanks for your reply. The transpositions I am performing are related to the dimension lifted transposition as seen in Henretty et al (http://repository.cmu.edu/

Anyway, I am basically transposing these large vectors into MxN arrays where N is always the SIMD register size which for this case is 4 as I am doing double precision. Therefore, on the graph, all matrix sizes will be MxN where M=no of element/veclen and N=veclen.

I guess this could be a cause for the poor performance due to gather and scatters? By the way, I am running this on a Xeon E5-2650 (Sandy Bridge).

The code looks something like this:

// out of place MKL transposition

mkl_domatcopy('r','t',VECLEN,

mkl_domatcopy('r','t',VECLEN,

roe_fluxes_xplane();

// retranspose data back into original format for y-sweep of flucrd

mkl_domatcopy('r','t',NV,

mkl_domatcopy('r','t',NV,

So basically I need to transpose the data into the DLT format and then back again. Originally, the matrices will have a rectangle shape format, as they represent distinct blocks from a multiblock grid.

Thank you in advance for your kind consideration.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.