simatcopy VS somatcopy performance

JoaoAlves95 · ‎11-25-2020

Good Afternoon,

I've noticed that simatcopy outperforms somatcopy for n x n square matrices. Only the execution time of simat/somat function was measured. These functions were called with the following parameters:

mkl_simatcopy('R' /* row-major ordering */,
'T' /* A will be transposed */,
n /* rows */,
n /* cols */,
1. /* scales the input matrix */,
src /* source matrix */,
n /* src_stride */,
n /* dst_stride */);

mkl_somatcopy('R' /* row-major ordering */,
'T' /* A will be transposed */,
n /* rows */,
n /* cols */,
1. /* scales the input matrix */,
src /* source matrix */,
n /* src_stride */,
dst /* destination matrix */,
n /* dst_stride */);

From what I understood in-place matrix transposition should be less efficient than its out-of-place counterpart. Isn't this true for square matrices?

I would also appreciate any insight on simatcopy memory complexity and which optimization techniques were used on this function.

Best Regards,

João Alves