Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

## Non-square Matrix Transpose

Beginner
1,006 Views

Hi guys,

Are there any highly optimized MKL routines or maybe performance primitives that can do rectangle matrix transposition but without scaling?

I've been using mkl_omatcopy but it seems to perform worse than a normal baseline implementation and I suspect this is due to the additional scaling that is performed. I've attached a plot running a naive baseline implementation with comparison on omatcopy and imatcopy. The latter I know runs very poorly on non-square matrices.

I just want to know whether I should start spending some time optimizing my own transpose routine with AVX/AVX2 and blocking or whether there's a very efficient one out there already.

Also, swapping indices is not viable for what I am trying to achieve.

Ioan

2 Replies
Moderator
1,006 Views

Ioan,  Could you give us M x N sizes instead of the # of elements?

Beginner
1,006 Views

Thanks for your reply. The transpositions I am performing are related to the dimension lifted transposition as seen in Henretty et al (http://repository.cmu.edu/cgi/viewcontent.cgi?article=1263&context=ece). Basically, it performs the required data layout organisation as to allow for aligned vector loads and stores of stencils in the x-direction path.

Anyway, I am basically transposing these large vectors into MxN arrays where N is always the SIMD register size which for this case is 4 as I am doing double precision. Therefore, on the graph, all matrix sizes will be MxN where M=no of element/veclen and N=veclen.

I guess this could be a cause for the poor performance due to gather and scatters? By the way, I am running this on a Xeon E5-2650 (Sandy Bridge).

The code looks something like this:

// out of place MKL transposition

mkl_domatcopy('r','t',VECLEN,NV,1,&q,NV,&qt,VECLEN);

mkl_domatcopy('r','t',VECLEN,NV,1,&aux,NV,&auxt, VECLEN);

roe_fluxes_xplane();

// retranspose data back into original format for y-sweep of flucrd

mkl_domatcopy('r','t',NV,VECLEN,1,&qt,VECLEN,&q,NV);

mkl_domatcopy('r','t',NV,VECLEN,1,&auxt,VECLEN,&aux,NV);

So basically I need to transpose the data into the DLT format and then back again. Originally, the matrices will have a rectangle shape format, as they represent distinct blocks from a multiblock grid.