simatcopy VS somatcopy performance

JoaoAlves95 — Wed, 25 Nov 2020 12:24:23 GMT

Good Afternoon,

I've noticed that simatcopy outperforms somatcopy for n x n square matrices. Only the execution time of simat/somat function was measured. These functions were called with the following parameters:

mkl_simatcopy('R' /* row-major ordering */,
'T' /* A will be transposed */,
n /* rows */,
n /* cols */,
1. /* scales the input matrix */,
src /* source matrix */,
n /* src_stride */,
n /* dst_stride */);

mkl_somatcopy('R' /* row-major ordering */,
'T' /* A will be transposed */,
n /* rows */,
n /* cols */,
1. /* scales the input matrix */,
src /* source matrix */,
n /* src_stride */,
dst /* destination matrix */,
n /* dst_stride */);

From what I understood in-place matrix transposition should be less efficient than its out-of-place counterpart. Isn't this true for square matrices?

I would also appreciate any insight on simatcopy memory complexity and which optimization techniques were used on this function.

Best Regards,

João Alves

topic simatcopy VS somatcopy performance in Software Tuning, Performance Optimization & Platform Monitoring

simatcopy VS somatcopy performance